Repository Pattern: What is the 'right size'? - model-view-controller

I'm building some repositories for an MVC application, and I'm trying to come up with the right way to divide responsibilities between repositories. In most cases, this is obvious. But there is one particular case where I'm not sure what the right answer is.
The users of this application need to track multiple types of time for their employees. For simplicity, let's consider only two. I'll call them "time cards" and "attendance." The exact nature of the difference between these two is not really important, but you should note that the end-users consider them entirely separate data. I think, though, that the reason they consider them entirely separate data is that they have never really had the opportunity to see them together in the past. Both types of records have almost entirely different business rules concerning editing the records, but they are also, generally speaking, both records of where an employee was at a particular time. Both types of time records have a great deal of properties in common, such as a total number of hours, and an employee for whom the time was collected. Both types also have a few properties which are completely unique to the individual type. We're keeping these "extra" properties in an instance of another type. So the general structure looks like this:
class TimeRecord
{
Person Employee { get; set; }
TimeSpan? Hours { get; set; }
}
class TimeCardData
{
TimeRecord Record { get; set; }
TProperty TimeCardProperty { get; set; }
}
class AttendanceData
{
TimeRecord Record { get; set; }
TProperty AttendanceProperty { get; set; }
}
So the question is, How many repositories are required here?
1 Repository
A design with only one repository would expose methods to return "time cards", "attendance" records, or both types in one list. This is fairly convenient for clients of the repository, but, to my mind, has a real danger of becoming a very fat class. I think that a repository for just "time cards" is already going to be one of the largest repositories in the system even without also handling "attendance" simply due to the complex business rules involved.
2 Repositories
Another design would have one repository for "time cards" and another repository for "attendance" records. This has the advantage that the business rules for, e.g., "time cards" are in a place by themselves. But I'd also like to have a way to get a list of all time records, regardless of type. It's not clear which repository to use for this case. Both?
3 Repositories
A design with one repository for "time cards", another repository for "attendance" records, and a third repository to deliver a read-only list of all time records is also a possibility. Like the 2 repository design, this has the advantage that the business rules for, e.g., "time cards" are in a place by themselves. It's now clear where to get the combined list. But I find it a bit weird that I could get the same record from two different repositories.
Hybrid
A hybrid approach would use a single repository, but move any business rules code (including selection of records) into separate types. In this example, a single "time record repository" would aggregate instances of business rule implementation classes for "time card" and "attendance" time. I think this is the approach I'm favoring right now.
Other?
Anything I've missed? Any compelling arguments for one design over the other?

Repositories are, at least not to my knowledge, a place for business rules. They are just a facade meant to mimic a collection; underneath they're basically pure data access (if that's they're job, you may not be persisting anything with a Repository as well). So separate repositories should not be considered for "business rules" reasons.
If your domain objects are really separate objects, then you should have separate repositories. Remember what a repository is: it's a facade. It mimics a collection to your domain. See here for a really good blog post on Repositories: http://devlicio.us/blogs/casey/archive/2009/02/20/ddd-the-repository-pattern.aspx
The repository is a facade; an abstraction.
That said... I do not think you have separate objects. You've got some issues here that have nothing to do with repositories and everything to do with the domain and the design of the domain. Are the two types of "timecards" actually two different things, or are they really the same?
You say, "But I find it a bit weird that I could get the same record from two different repositories."
That tells me that they are actually the same data, expressed in different ways. And there are ways to handle that.
If this is really the case, then what you have here is subclasses of a common base class (something that can be modeled in a DB pretty easy and handled elegantly with NHibernate, for instance).
I'll give you an example of a project I am working on. I have something called a "Broadcast". It's a base class; abstract. Can't be instantiated. I have two specific concrete types of this class: DeviceBroadcast and FileBroadcast. One streams audio/video from a device (like a DirectX capture card) and one streams audio/video from a file source (like an .mp3).
I have one repository that returns a Broadcast object. I can cast it to a FileBroadcast to manipulate specific information about a FileBroadcast, or I can cast to a DeviceBroadcast for the same reason - if it is of that type. A Broadcast cannot be both a FileBroadcast and DeviceBroadcast type. It has to be one or the other.
In the database I store the generic broadcast parameters in a Broadcast table, and then I store the file specific properties in a FileBroadcast table. Same goes for the DeviceBroadcast table; separate. When I query via the Repository, however, I just want Broadcasts. That's my root aggregate object and thus that's my repository.
The Broadcast base class has common methods that both subclasses use (like the GetCommand() method, which returns a specific command-line argument to launch a VLC process). Subclasses have to override and implement that method because it's abstract. In this way, the "business logic" that is unique to a FileBroadcast is contained in the FileBroadcast class. The "business logic" that is unique to a DeviceBroadcast is contained in the DeviceBroadcast class. Any logic that is common to both is contained in the superclass, Broadcast.
You seem to have a similiar situation here and that's why I am sharing my design. I think it might serve you well.
Above all, think about your domain and the data. If you are going to get duplicate data by way of separate repositories, then you need to give more thought to how you're designing the domain. Don't let the users dictate your domain design either. They know the domain from their perspective. All you have to do is be able to present the data to them in a way they understand. That doesn't mean you have to have a bad design; You can have a good design behind the scenes because your code is the thing that has to use the domain.

Related

Entity/Domain purety dilemma in the clean architecutre/Domain driven design

Im working on a eCommerce system in which I try to implement the clean architecture.
But currently Im stuck a little bit.
So I have a use case called: CreateItemUseCase in which I create a Item (alias product) for the shop.
In this use case I call a method (createItemEntity()) of a Entity called ItemEntity.
This method creates just a data object with data like:
userId
itemTitle
itemDescription
...
So now I need another method in the ItemEntity which validates the userId.
To create a Item the user needs to have a userId so the method in the ItemEntity would be called:
validateUserId()
This method should check if the user has a userId in the database and if not the Item creation would be imposible.
Now my question:
How do I validate the userId?
Should I have the validateUserId() method take a array as a parameter, In which all the User Id´s are saved... something like this:
validateUserId(toBeValidated: Int, allUserIds: Array[Int])
{
// loop through the allUserIds to see if toBeValidated is in there ...
}
Or should I query the data in the method (which Im pretty sure, would violate the dependencie rule) like this:
validateUserId(toBeValidated: Int)
{
// get all user id´s through a query, and check if toBeValidated is in there ...
}
Or should I do it completly different?
In general, entities should only contain logic that is operating on information (data) that is within the entity's scope. Knowing how to query if a user with a certain user id exists or not is not in the scope of the item entity.
I think your motivation to keep all the logic for validation together is reasonable but on the other hand you should not introduce infrastructure dependencies (like talking to the database or user repository) to the entity. Knowing how to query if a user with a certain user id exists or not is not in the scope of the item entity.
Or should I query the data in the method (which Im pretty sure, would violate the dependencie rule) like this
Exactly, that's why it's usually best trying to avoid that to keep entities free from such dependencies. Introducing such dependencies can easily get out of hand and also increase complexity for testing such entities. If you need to do that it should be a very thought decision that justifies that.
Should I have the validateUserId() method take a array as a parameter, In which all the User Id´s are saved... something like this
This is not such a bad idea in general, because you would not make the entity dependent on infrastructure and provide the entity with all the data it needs for decision making. But on the other hand now you can run into another problem: bad performance.
Now you would retrieve all user ids everytime you create an item. If you would do the check for the user's existence somewhere else this can be optimized much better.
I suggest to ask the user repository beforehand if the user exists prior to performance the entity creation including all the other potentially required validations inside item entity that make sense there. The user repository could have a query that optimizes for just checking for the existence of this user by id.
In case these two operations (asking for the user's existence and creating the new item) only happen at one place of the application I'd be pragmatic and perform the user existence check directly in the use case. If this would occur from different places in your application you can extract that logic into a separate (domain) service (e.g. item service) which deals with the repetitive flow operations working with the user repository and item entity.
What you are dealing here with is a trade-off decision between domain model purity, domain model completeness and performance considerations. In this great blog this is named the Domain-Driven Design Trilemma. I suggest going through the reasoning in the article, I'm pretty sure it will help you coming to a final decision.
I think this is one of side case of what we call Business Gerunds
Details: https://www.forbes.com/sites/forbestechcouncil/2022/05/19/10-best-practices-for-event-streaming-success/
If Item has to validate the user, just see what common attributes are there between entities and who is responsible for change of those, and then a segregation can be done in DDD representation, and using a composite via transaltion, outside world entities can exist as the same

DDD and Entity Framework, Filters

So I am struggling with the approach DDD has to follow when we talk about filtering and queries. From this SO question Is it okay to bypass the repository pattern for complex queries? I can see the filtering by User should be done after getting all the Products. The piece of code from the accepted answer is:
Products products = /* get Products repository implementation */;
IList<Product> res = products.BoughtByUser(User user);
But wait, and if the database has 1 million Products? Isn't the best approach to do this filter directly in the database like so:
productsRepository.Find(p => p.User.Id == userId);
But from my actual knowledge of DDD this would be wrong, because this logic should be inside the Product itself.
Therefore, how to handle this scenario?
I agree with Yorro's answer. According to the comment, products is indeed a repository.
The question around performance of the underlying datastructure vs keeping the domain knowledge in the application could be explored further though.
Databases are great at filtering and querying data, they are optimized to do so, and for us to ignore that simply to "keep our knowledge in the domain" is naive.
Your example shows Repository Specialization, which is fine albeit verbose.
The logic of that search is encapsulated by that call, and as long as the interface for calling that method is in the domain, and the implementation in the data-layer, everything is fine.
Indeed the call could be to a stored-procedure that performs a very complex operation. (In this case, yes some of your logic has escaped the domain, but you make it as a conscious decision, and should you introduce another data technology, you would have to implement that functionality again.)
There is another option...
We can encapsulate the logic of the search in a Specification (http://en.wikipedia.org/wiki/Specification_pattern) and pass the specification from our domain logic code to our Repository who would interpret the specification and do the query.
That makes our domain oblivious of how the underlying data structure works, but it puts it in control of what the search criteria is.
I usually find myself implementing a blend of Repository Specialization, and having a base repository that accepts an ISpecification for more lightweight queries.
Based on your link, the Products class is the repository, just named without the "repository" suffix.
You are correct that the filtering should be in the database, you just don't see it because you are in the domain.
The first and second approach are the same. The difference is that the first is more align with DDD because of the proper usage of the ubiquitous language
// First example
// Take note, the products IS the repository
IList<Product> productsByUser = products.BoughtByUser(User user);
// Second example
IList<Product> productsByUser = productsRepository.Find(p => p.User.Id == userId);
If you dive in the data access layer, you can see the filtering that you are talking about.
public IList<Product> BoughByUser(User user)
{
IList<Product> products = this.dbContext.Products.Find(p => p.User.Id == user.ID);
return products;
}
This is not the direct answer of your question (Yorro's answer is right) but maybe it helps you to better understand DDD. This is a "wrong way, turn back" answer.
Your views doesn't need domain rules; doesn't need aggregates with 1 million of childs or 1 million of entities. So, you don't need to "bypass" the product repository because you should have "View Services" with "View Repositories" wich allows you to query (and paging, etc) denormalize data from persistence for your views.
You should apply domain rules using aggregates/entities when update/insert/delete is needed.
Once the user select one or several products from the 1 million list and push, for example, delete button you should use product repository to retrieve the aggregate/entity of selected products, apply delete rules and invariants and save in persistence.

How do I represent multiple DTOs for a domain object in .NET Web API?

I'm writing a set of REST services and have come upon a problem that I'm sure has an appropriate solution/pattern that's just eluding me.
For instance /api/People/1 will return a serialized representation of PersonDto (which is a pared down representation of the Person domain object created by Entity Framework. I'm using AutoMapper to hydrate PersonDto.
However a second controller (say, /api/Classes/) is going to return different complex object, which may contain one or more Persons, however I want to represent each person in a different way than simply using an existing PersonDto (e.g. I might require more or less fields).
Do I need to define a ClassPersonDto? I'm not sure what the "proper" thing is to do here.
If the model of "person" being passed back in "Classes" is different then the "PersonDto" model, then yes, create a different model. You don't need to, but it's almost always better to keep your classes, including entities, as specific as possible.

Validation on domain entities along with MVP

How do you apply validation in an MVP/domain environment ?
Let me clearify with an example:
Domain entity:
class Customer
{
string Name;
etc.
}
MVP-model
class CustomerModel
{
string Name;
etc.
}
I want to apply validation on my domain entities but the MVP model has it's own model/class
apart from the domain entity, does that mean I have to copy the validation code
to also work on the MVP-model?
One solution I came up with is to drop the MVP-model and use the domain entity as MVP-Model,
but I don't want to set data to the entities that isn't validated yet.
And second problem that rises is that if the entity has notify-events,
other parts of the application will be affected with faulty data.
A third thing with that approach is if the user edits some data and then cancels the edit, how do I revert to the old values ? (The entity might not come from a DB so reloading the entity is't possible in all cases).
Another solution is to make some sort of copy/clone of the entity in question and use the copy as MVP-model, but then it might get troublesome if the entity has a large object graph.
Anyone has some tips about these problems?
Constraining something like the name of a person probably does not rightfully belong in the domain model, unless in the client's company there is actually a rule that they don't do business with customers whose names exceed 96 characters.
String length and the like are not concerns of the domain -- two different applications employing the same model could have different requirements, depending on the UI, persistence constraints, and use cases.
On the one hand, you want to be sure that your model of a person is complete and accurate, but consider the "real world" person you are modeling. There are no rules about length and no logical corollary to "oops, there was a problem trying to give this person a name." A person just has a name, so I'd argue that it is the responsibility of the presenter to validate what the user enters before populating the domain model, because the format of the data is a concern of the application moreso than the domain.
Furthermore, as Udi Dahan explains in his article, Employing the Domain Model Pattern, we use the domain model pattern to encapsulate rules that are subject to change. That a person should not a have a null name is not a requirement that is likely ever to change.
I might consider using Debug.Assert() in the domain entity just for an added layer of protection through integration and/or manual testing, if I was really concerned about a null name sneaking in, but something like length, again, doesn't belong there.
Don't use your domain entities directly -- keep that presentation layer; you're going to need it. You laid out three very real problems with using entities directly (I think Udi Dahan's article touches on this as well).
Your domain model should not acquiesce to the needs of the application, and soon enough your UI is going to need an event or collection filter that you're just going to have to stick into that entity. Let the presentation layer serve as the adapter instead and each layer will be able to maintain its integrity.
Let me be clear that the domain model does not have to be devoid of validation, but the validation that it contains should be domain-specific. For example, when attempting to give someone a pay raise, there may be a requirement that no raise can be awarded within 6 months of the last one so you'd need to validate the effective date of the raise. This is a business rule, is subject to change, and absolutely belongs in the domain model.

Fat Domain Models => Inefficient?

Looking at DDD, we abstract the database into the various models which we operate on and look at it as a repository where our models live. Then we add the Data Layers and the Service/Business layers on top of it. My question is, in doing so, are we creating inefficiencies in data transfer by building fat models?
For example, say we have system that displays an invoice for a customer on the screen.
Thinking of it in terms of OOP, we'd probably end up with an object that looks somewhat like this:
class Invoice {
Customer _customer;
OrderItems _orderitems;
ShippingInfo _shippingInfo;
}
class Customer {
string name;
int customerID;
Address customerAddress;
AccountingInfo accountingInfo;
ShoppingHistory customerHistory;
}
(for the sake of the question/argument,
let's say it was determined that the customer class had to
implement AccountingInfo and ShoppingHistory)
If the invoice solely needs to print the customer name, why would we carry all the other baggage with it? Using the repository type of approach seems like we would be building these complex domain objects which require all these resources (CPU, memory, complex query joins, etc) AND then transferring it over the tubes to the client.
Simply adding a customerName property to the invoice class would be breaking away from abstractions and seems like a horrible practice. On the third hand, half filling an object like the Customer seems like a very bad idea as you could end up creating multiple versions of the same object (e.g. one that has a an Address, but no ShoppingHistory, and one that has AccountingInfo but no Address, etc, etc). What am I missing, or not understanding?
As good object relational mappers can lazy load the relationships, you would therefore pull back the customer for your invoice, but ignore their accounting and shopping history. You could roll your own if you're not using an oject-relational mapper.
Often you can't do this within your client as you'll have crossed your trasaction boundary (ended your database trasaction) and so it is up to your service layer to ensure the right data has been loaded.
Testing the right data is available (and not too much of it) is often good to do in unit tests on a service layer.
You say "it was determined that the customer class had to implement AccountingInfo and ShoppingHistory", so clearly displaying an invoice is NOT the only task that the system performs (how else was it "determined" that customers need those other functionalities otherwise?-).
So you need a table of customers anyway (for those other functionalities) -- and of course your invoice printer needs to get customer data (even just the name) from that one table, the same one that's used by other functionality in the system.
So the "overhead" is purely illusory -- it appears to exist when you look at one functionality in isolation, but doesn't exist at all when you look at the whole system as an integrated whole.

Resources