This question bothers me for some time now (I hope I'm not the only one). I want to take a typical 3-tier Java EE app and see how it possibly can look like implemented with actors. I would like to find out whether it actually makes any sense to make such transition and how I can profit from it if it does makes sense (maybe performance, better architecture, extensibility, maintainability, etc...).
Here are typical Controller (presentation), Service (business logic), DAO (data):
trait UserDao {
def getUsers(): List[User]
def getUser(id: Int): User
def addUser(user: User)
}
trait UserService {
def getUsers(): List[User]
def getUser(id: Int): User
def addUser(user: User): Unit
#Transactional
def makeSomethingWithUsers(): Unit
}
#Controller
class UserController {
#Get
def getUsers(): NodeSeq = ...
#Get
def getUser(id: Int): NodeSeq = ...
#Post
def addUser(user: User): Unit = { ... }
}
You can find something like this in many spring applications. We can take simple implementation that does not have any shared state and that's because does not have synchronized blocks... so all state is in the database and application relies on transactions. Service, controller and dao have only one instance. So for each request application server will use separate thread, but threads will not block each other (but will be blocked by DB IO).
Suppose we are trying to implement similar functionality with actors. It can look like this:
sealed trait UserActions
case class GetUsers extends UserActions
case class GetUser(id: Int) extends UserActions
case class AddUser(user: User) extends UserActions
case class MakeSomethingWithUsers extends UserActions
val dao = actor {
case GetUsers() => ...
case GetUser(userId) => ...
case AddUser(user) => ...
}
val service = actor {
case GetUsers() => ...
case GetUser(userId) => ...
case AddUser(user) => ...
case MakeSomethingWithUsers() => ...
}
val controller = actor {
case Get("/users") => ...
case Get("/user", userId) => ...
case Post("/add-user", user) => ...
}
I think it's not very important here how Get() and Post() extractors are implemented. Suppose I write a framework to implement this. I can send message to controller like this:
controller !! Get("/users")
The same thing would be made by controller and service. In this case the whole workflow would be synchronous. Even worse - I can process only one request at time (in meantime all other requests would land in controller's mailbox). So I need to make it all asynchronous.
Is there any elegant way to perform each processing step asynchronously in this setup?
As far as I understand each tier should somehow save the context of the message it receives and then send message to the tier beneath. When tier beneath replies with some result message I should be able to restore initial context and reply with this result to the original sender. Is this correct?
Moreover, at the moment I have only one instance of actor for each tier. Even if they will work asynchronously, I still can process in parallel only one controller, service and dao message. This means that I need more actors of the same type. Which leads me to LoadBalancer for each tier. This also means, that if I have UserService and ItemService I should LoadBalace both of them separately.
I have feeling, that I understand something wrong. All needed configuration seems to be overcomplicated. What do you think about this?
(PS: It would be also very interesting to know how DB transactions fit into this picture, but I think it's overkill for this thread)
Avoid asynchronous processing unless and until you have a clear reason for doing it. Actors are lovely abstractions, but even they don't eliminate the inherent complexity of asynchronous processing.
I discovered that truth the hard way. I wanted to insulate the bulk of my application from the one real point of potential instability: the database. Actors to the rescue! Akka actors in particular. And it was awesome.
Hammer in hand, I then set about bashing every nail in view. User sessions? Yes, they could be actors too. Um... how about that access control? Sure, why not! With a growing sense of un-ease, I turned my hitherto simple architecture into a monster: multiple layers of actors, asynchronous message passing, elaborate mechanisms to deal with error conditions, and a serious case of the uglies.
I backed out, mostly.
I retained the actors that were giving me what I needed - fault-tolerance for my persistence code - and turned all of the others into ordinary classes.
May I suggest that you carefully read the Good use case for Akka question/answers? That may give you a better understanding of when and how actors will be worthwhile. Should you decide to use Akka, you might like to view my answer to an earlier question about writing load-balanced actors.
Just riffing, but...
I think if you want to use actors, you should throw away all previous patterns and dream up something new, then maybe re-incorporate the old patterns (controller, dao, etc) as necessary to fill in the gaps.
For instance, what if each User is an individual actor sitting in the JVM, or via remote actors, in many other JVMs. Each User is responsible for receiving update messages, publishing data about itself, and saving itself to disk (or a DB or Mongo or something).
I guess what I'm getting at is that all your stateful objects can be actors just waiting for messages to update themselves.
(For HTTP (if you wanted to implement that yourself), each request spawns an actor that blocks until it gets a reply (using !? or a future), which is then formatted into a response. You can spawn a LOT of actors that way, I think.)
When a request comes in to change the password for user "foo#example.com", you send a message to 'Foo#Example.Com' ! ChangePassword("new-secret").
Or you have a directory process which keeps track of the locations of all User actors. The UserDirectory actor can be an actor itself (one per JVM) which receives messages about which User actors are currently running and what their names are, then relays messages to them from the Request actors, delegates to other federated Directory actors. You'd ask the UserDirectory where a User is, and then send that message directly. The UserDirectory actor is responsible for starting a User actor if one isn't already running. The User actor recovers its state, then excepts updates.
Etc, and so on.
It's fun to think about. Each User actor, for instance, can persist itself to disk, time out after a certain time, and even send messages to Aggregation actors. For instance, a User actor might send a message to a LastAccess actor. Or a PasswordTimeoutActor might send messages to all User actors, telling them to require a password change if their password is older than a certain date. User actors can even clone themselves onto other servers, or save themselves into multiple databases.
Fun!
Large compute-intensive atomic transactions are tricky to pull off, which is one reason why databases are so popular. So if you are asking whether you can transparently and easily use actors to replace all the transactional and highly-scalable features of a database (whose power you are very heavily leaning on in the Java EE model), the answer is no.
But there are some tricks you can play. For example, if one actor seems to be causing a bottleneck, but you don't want to go to the effort of creating a dispatcher/worker farm structure, you may be able to move the intensive work into futures:
val service = actor {
...
case m: MakeSomethingWithUsers() =>
Futures.future { sender ! myExpensiveOperation(m) }
}
This way, the really expensive tasks get spawned off in new threads (assuming that you don't need to worry about atomicity and deadlocks and so on, which you may--but again, solving these problems is not easy in general) and messages get sent along to wherever they should be going regardless.
For transactions with actors, you should take a look at Akka's "Transcators", which combine actors with STM (software transactional memory): http://doc.akka.io/transactors-scala
It's pretty great stuff.
As you said, !! = blocking = bad for scalability and performance, see this:
Performance between ! and !!
The need for transactions usually occur when you are persisting state instead of events.
Please have a look at CQRS and DDDD (Distributed Domain Driven Design) and Event Sourcing, because, as you say, we still haven't got a distributed STM.
Related
Consider a HTTP controller endpoint that accepts requests, validates and then returns ack, but in a meantime in a background does some "heavy work".
There are 2 approaches with Reactor (that I'm interested in) that you can achieve this:
First approach
#PostMapping(..)
fun acceptRequest(request: Request): Response {
if(isValid(request)) {
Mono.just(request)
.flatMap(service::doHeavyWork)
.subscribe(...)
return Response(202)
} else {
return Response(400)
}
}
Second approach
class Controller {
private val service = ...
private val sink = Sinks.many().unicast().onBackpressureBuffer<Request>()
private val stream = sink.asFlux().flatMap(service::doHeavyWork).subscribe(..)
fun acceptRequest(request: Request): Response {
if(isValid(request)) {
sink.tryEmitNext(request) // for simplicity not handling errors
return Response(202)
} else {
return Response(400)
}
}
}
Which approach is better/worse and why?
The reason I'm asking is, that in Akka, building streams on demand was not really effective, since the stream needed to materialize every time, so it was better to have the "sink approach". I'm wondering if this might be a case for reactor as well or maybe there are other advantages / disadvantages of using those approaches.
I'm not too familiar with Akka, but building a reactive chain definitely doesn't attract a huge overhead with Reactor - that's the "normal" way of handling a request. So I don't see the need to use a separate sink like in your second approach - that just seems to be adding complexity for little gain. I'd therefore say the first approach is better.
That being said, generally, subscribing yourself as you do in both examples isn't recommended - but this kind of "fire and forget" work is one of the few cases it might make sense. There's just a couple of other potential caveats I'd raise here that may be worth considering:
You call the work "heavy", and I'm not sure if that means it's CPU heavy, or just IO heavy or takes a long time. If it just takes a long time due to firing off a bunch of requests, then that's no big deal. If it's CPU heavy however, then that could cause an issue if you're not careful - you probably don't want to execute CPU heavy tasks on your event loop threads. In this case, I'd probably create a separate scheduler backed by your own executor service, and then use subscribeOn() to offload those CPU heavy tasks.
Remember the "fire and forget" pattern in this case really is "forget" - you have absolutely no way of knowing if the heavy task you've offloaded has worked or not, since you've essentially thrown that information away by self-subscribing. Depending on your use-case that might be fine, but if the task is critical or you need some kind of feedback if it fails, then it's worth considering that may not be the best approach.
I use event sourcing to store my object.
Changes are captured via domain events, holding only the minimal information required, e.g.
GroupRenamedDomainEvent
{
string Name;
}
GroupMemberAddedDomainEvent
{
int MemberId;
string Name;
}
However elsewhere in my application I want to be notified if a Group is updated in general. I don’t want to have to accumulate or respond to a bunch of more granular and less helpful domain events.
My ideal event to subscribe to is:
GroupUpdatedIntegrationEvent
{
int Id;
string Name;
List<Member> Members;
}
So what I have done is the following:
Update group aggregate.
Save down generated domain events.
Use these generated domain events to to see whether to trigger my integration event.
For the example above, this might look like:
var groupAggregate = _groupAggregateRepo.Load(id);
groupAggregate.Rename(“Test”);
groupAggregate.AddMember(1, “John”);
_groupAggregateRepo.Save(groupAggregate);
var domainEvents = groupAggregate.GetEvents();
if (domainEvents.Any())
{
_integrationEventPublisher.Publish(
new GroupUpdatedIntegrationEvent
{
Id = groupAggregateId,
Name = groupAggregate.Name,
Members = groupAggregate.Members
});
}
This means my integration events used throughout the application are not coupled to what data is used in my event sourcing domain events.
Is this a sensible idea? Has anyone got a better alternative? Am I misunderstanding these terms?
Of course you're free to create and publish as many events as you want, but I don't see (m)any benefits there.
You still have coupling: You just shifted the coupling from one Event to another. Here it really depends how many Event Consumers you've got. And if everything is running in-memory or stored in a DB. And if your Consumers need some kind of Replay mechanism.
Your integration Events can grow over time and use much bandwidth: If your Group contains 1000 Members and you add 5 new Members, you'll trigger 5 integration events that always contain all members, instead of just the small delta. It'll use much more network bandwidth and hard drive space (if persisted).
You're coupling your Integration Event to your Domain Model. I think this is not good at all. You won't be able to simply change the Member class in the future, because all Event Consumers depend on it. A solution could be to instead use a separate MemberDTO class for the Integration Event and write a MemberToMemberDTO converter.
Your Event Consumers can't decide which changes they want to handle, because they always just receive the full blown current state. The information what actually changed is lost.
The only real benefit I see is that you don't have to again write code to apply your Domain Events.
In general it looks a bit like Read Model in CQRS. Maybe that's what you're looking for?
But of course it depends. If your solution fits your application's needs, then it'll be fine to do it that way. Rules are made to show you the right direction, but they're also meant to be broken when they get in your way (and you know what you're doing).
Let's say we have two services A and B. B has a relation to A so it needs to know about the existing entities of A.
Service A publishes events every time an entity is created or updated. Service B subscribes to the events published by A and therefore knows about the entities existing in service A.
Problem: The client (UI or other micro services) creates a new entity 'a' and right away creates a new entity 'b' with a reference to 'a'. This is done without much delay so what happens if service B did not receive/handle the event from B before getting the create request with a reference to 'b'?
How should this be handled?
Service B must fail and the client should handle this and possibly do retry.
Service B accepts the entity and over time expect the relation to be fulfilled when the expected event is received. Service B provides a state for the entity that ensures it cannot be trusted before the relation have been verified.
It is poor design that the client can/has to do these two calls in the same transaction. The design should be different. How?
Other ways?
I know that event platforms like Kafka ensures very fast event transmittance but there will always be a delay and since this is an asynchronous process there will be kind of a race condition.
What you're asking about falls under the general category of bridging the gap between Eventual Consistency and good User Experience which is a well-documented challenge with a distributed architecture. You have to choose between availability and consistency; typically you cannot have both.
Your example raises the question as to whether service boundaries are appropriate. It's a common mistake to define microservice boundaries around Entities, but that's an anti-pattern. Microservice boundaries should be consistent with domain boundaries related to the business use case, not how entities are modeled within those boundaries. Here's a good article that discusses decomposition, but the TL;DR; is:
Microservices should be verbs, not nouns.
So, for example, you could have a CreateNewBusinessThing microservice that handles this specific case. But, for now, we'll assume you have good and valid reasons to have the services divided as they are.
The "right" solution in your case depends on the needs of the consuming service/application. If the consumer is an application or User Interface of some sort, responsiveness is required and that becomes your overriding need. If the consumer is another microservice, it may well be that it cares more about getting good "finalized" data rather than being responsive.
In either of those cases, one good option is a facade (aka gateway) service that lives between your client and the highly-dependent services. This service can receive and persist the request, then respond however you'd like. It can give the consumer a 200 - OK response with an endpoint to call back to check status of the request - very responsive. Or, it could receive a URL to use as a webhook when the response is completed from both back-end services, so it could notify the client directly. Or it could publish events of its own (it likely should). Essentially, you can tailor the facade service to provide to as many consumers as needed in the way each consumer wants to talk.
There are other options too. You can look into Task-Based UI, the Saga pattern, or even just Faking It.
I think you would like to leverage the flexibility of a broker and the confirmation of a synchronous call . Both of them can be achieved by this
https://www.rabbitmq.com/tutorials/tutorial-six-dotnet.html
Let's say I've got a domain class, which has functions, that are to be called in a sequence. Each function does its job but if the previous step in the sequence is not done yet, it throws an error. The other way is that each function completes the step required for it to run, and then executes its own logic. I feel that this way is not a good practice, since I am adding multiple responsibilities, and the caller wont know what all operations can happen when he invokes a method.
My question is, how to handle dependent scenarios in DDD. Is it the responsibility of the caller to invoke the methods in the right sequence? Or do we make the methods handle the dependent operations before it's own logic?
Is it the responsibility of the caller to invoke the methods in the right sequence?
It's ok if those methods have a business meaning. For example the client may book a flight, and then book a hotel room. Both of those is something the client understands, and it is the client's logic to call them in this sequence. On the other hand, inserting the reservation into the database, then committing (or whatever) is technical. The client should not have to deal with that at all. Or "initializing" an object, then calling other methods, then calling "close".
Requiring a sequence of technical calls is a form of temporal coupling, it is considered a bad practice, and is not directly related to DDD.
The solution is to model the problem better. There is probably a higher level use-case the caller wants achieved with this call sequence. So instead of publishing the individual "steps" required, just support the higher use-case as a whole.
In general you should always design with the goal to get any sequence of valid calls to actually mean something (as far as the language allows).
Update: A possible model for the mentioned "File" domain:
public interface LocalFile {
RemoteFile upload();
}
public interface RemoteFile {
RemoteFile convert(...);
LocalFile download();
}
From my point of view, what you are describing is the orchestration of domain model operations. That's the job of the application layer, the layer upon domain model. You should have an application service that would call the domain model methods in the right sequence, and it also should take into account whether some step has left any task undone, and in such case, tell the next step to perform it.
TLDR; Scroll to the bottom for the answer, but the backstory will give some good context.
If the caller into your domain must know the order in which to call things, then you have missed an opportunity to encapsulate business logic in your domain, which is a symptom of an anemic domain.
#RobertBräutigam made a very good point:
Requiring a sequence of technical calls is a form of temporal coupling, it is considered a bad practice, and is not directly related to DDD.
This is true, but it is worse when you do it with your domain model because non-domain concerns get intermixed with domain concerns. Intent becomes lost in a sea of non business logic. If you can, you look for a higher-order aggregate that encapsulates the ordering. To borrow Robert's example, rather than booking a flight then a hotel room, and forcing that on the client, you could have a Vacation aggregate take both and validate it.
I know that sounds wrong in your case, and I suspect you're right. There's a clear dependency that can't happen all at once, so we can't be the end of the story. When you have a clear dependency with intermediate transactions that must occur before the "final" state, we have... orchestration (think sagas, distributed transactions, domain events and all that goodness).
What you describe with file operations spans across transactions. The manipulation (state change) of a domain is transactional at each point in a distributed transaction, but is not transactional overall. So when #choquero70 says
you are describing is the orchestration of domain model operations. That's the job of the application layer, the layer upon domain model.
that's also correct. Orchestration is key. Each step must manipulate the state of the domain once, and once only, and leave it in a valid state, but it OK for there to be multiple steps.
Each of those individual points along the timeline are valid moments in the state of your domain.
So, back to your model. If you expose a single interface with multiple possible calls to all steps, then you leave yourself open to things being called out of order. Make this impossible or at least improbable. Orchestration is not just about what to do, but what to prevent from happening. Create smaller interfaces/classes to avoid accidentally increasing the "surface area" of what could be misused accidentally.
In this way, you are guiding the caller on what to do next by feeding them valid intermediate states. But, and this is the important part, the burden on what to call in what order is not on the caller. Sure, the caller could know what to do, but why force it.
Your basic algorithm is the same: upload, transform, download.
Is it the responsibility of the caller to invoke the methods in the right sequence?
Not exactly. Is the responsibility of the caller to choose from legitimate choices given the state of your domain. It's "your" responsibility to present these choices via business methods on your correctly modeled moment/interval aggregate suitable for the caller to use.
Or do we make the methods handle the dependent operations before it's own logic?
If you've setup orchestration correctly, this won't be necessary. But it does make sense to validate anyway.
On a side note, each step of the orchestration you do should be very linear in nature. I tell my developers to be suspicious of an orchestration step that has an if statement in it. If there's an if it's likely better to be part of another orchestration step or encapsulated in business logic.
In DDD, where should validation logic for pagination queries reside?
For example, if the service layer receives a query for a collection with parameters that look like (in Go), though feel free to answer in any language:
// in one file
package repositories
type Page struct {
Limit int
Offset int
}
// Should Page, which is part of the repository
// layer, have validation behaviour?
func (p *Page) Validate() error {
if p.Limit > 100 {
// ...
}
}
type Repository interface {
func getCollection(p *Page) (collection, error)
}
// in another file
package service
import "repositories"
type Service struct {
repository repositories.Repository
}
// service layer
func (s *Service) getCollection(p *repositories.Page) (pages, error) {
// delegate validation to the repository layer?
// i.e - p.Validate()
// or call some kind of validation logic in the service layer
// i.e - validatePagination(p)
s.repository.getCollection(p)
}
func validatePagination(p *Page) error {
if p.Limit > 100 {
...
}
}
and I want to enforce a "no Limit greater than 100" rule, does this rule belong in the Service layer or is it more of a Repository concern?
At first glance it seems like it should be enforced at the Repository layer, but on second thought, it's not necessarily an actual limitation of the repository itself. It's more of a rule driven by business constraints that belongs on the entity model. However Page isn't really a domain entity either, it's more a property of the Repository layer.
To me, this kind of validation logic seems stuck somewhere between being a business rule and a repository concern. Where should the validation logic go?
The red flag for me, is the same one identified by #plalx. Specifically:
It's more of a rule driven by business constraints that belongs on the
entity model
In all likelihood, one of two things are happening. The less likely of the two is that the business users are trying to define the technical application the domain model. Every once in a while, you have a business user who knows enough about technology to try to interject these things, and they should be listened to - as a concern, and not a requirement. Use cases should not define performance attributes, as those are acceptance criteria of the application, itself.
That leads into the more likely scenario, in that the business user is describing pagination in terms of the user interface. Again, this is something that should be talked about. However, this is not a use case, as it applies to the domain. There is absolutely nothing wrong with limiting dataset sizes. What is important is how you limit those sizes. There is an obvious concern that too much data could be pulled back. For example, if your domain contains tens of thousands of products, you likely do not want all of those products returned.
To address this, you should also look at why you have a scenario that can return too much data in the first place. When looking at it purely from a repository's perspective, the repository is used simply as a CRUD factory. If your concern is what a developer could do with a repository, there are other ways to paginate large datasets without bleeding either a technological or application concern into the domain. If you can safely deduce that the aspect of pagination is something owned by the implementation of the application, there is absolutely nothing wrong with having the pagination code outside of the domain completely, in an application service. Let the application service perform the heavy lifting of understanding the application's requirement of pagination, interpreting those requirements, and then very specifically telling the domain what it wants.
Instead of having some sort of GetAll() method, considering having a GetById() method that takes an array of identifiers. Your application service performs a dedicated task of "searching" and determining what the application is expecting to see. The benefit may not be immediately apparent, but what do you do when you are searching through millions of records? If you want to considering using something like a Lucene, Endeca, FAST, or similar, do you really need to crack the domain for that? When, or if, you get to the point where you want to change out a technical detail and you find yourself having to actually touch your domain, to me, that is a rather large problem. When your domain starts to serve multiple applications, will all of those application share the same application requirements?
The last point is the one that I find hits home the most. Several years back, I was in the same situation. Our domain had pagination inside of the repositories, because we had a business user who had enough sway and just enough technical knowledge to be dangerous. Despite the objections of the team, we were overruled (which is a discussion onto itself). Ultimately, we were forced to put pagination inside of the domain. The following year, we started to use the domain within the concept of other application's inside of the business. The actual business rules never changed, but the way that we searched did - depending on the application. That left us having to bring up another set of methods to accommodate, with the promise of reconciliation in the future.
That reconciliation came with the fourth application to use the domain, which was for an external third-party to consume, when we finally conveyed the message that these continual changes in the domain could have been avoided by allowing the application to own its own requirements and providing a means to facilitate a specific question - such as "give me these specific products". The previous approach of "give me twenty products, sorted in this fashion, with a specific offset" in no way described the domain. Each application determined what a "pagination" ultimately meant to itself and how it wanted to load those results. Top result, reversing order in the middle of a paged set, etc. Those were all eliminated because those were moved nearer their actual responsibilities and we empowered the application while still protecting the domain. We used the service layer as a delineation for what is considered "safe". Since the service layer acted as a go-between, between the domain and the application, we could reject a request at the service-level if, for example, the application requested more than one hundred results. This way, the application could not just do whatever it pleased, and the domain was left gleefully oblivious to the technical limitation being applied to the call being made.
"It's more of a rule driven by business constraints that belongs on
the entity model"
These kind of rules generally aren't business rules, they are simply put in place (most likely by developers without business experts involvement) due to technical system limitations (e.g. guarantee the system's stability). They usually find their natural home in the Application layer, but could be placed elsewhere if it's more practical to do so.
On the other hand, if business experts are interested by the resource/cost factor and decide to market this so that customers may pay more to view more at a time then that would become a business rule; something the business really cares about.
In that case the rule checking would certainly not go in the Repository because the business rules would get buried in the infrastructure layer. Not only that but the Repository is very low-level and may be used in automated scripts or other processes where you would not want these limitations to apply.
Actually, I usually apply some CQRS principles and avoid going through repositories entirely for queries, but that's another story.
At first glance it seems like it should be enforced at the Repository
layer, but on second thought, it's not necessarily an actual
limitation of the repository itself. It's more of a rule driven by
business constraints that belongs on the entity model.
Actually repositories are still domain. They're mediators between the domain and data mapping layer. Thereby, you should still consider them as domain.
Therefore, a repository interface implementation should enforce domain rules.
In summary, I would ask myself: do I want to allow non-paginated access to abstracted data by the repository from any domain operation?. And the answer should be probably not, because such domain might own thousands of domain objects, and it would be a suboptimal retrieval trying to get too many domain objects at once, wouldn't be?
Suggestion
* Since I don't know which language is currently using the OP, and I find that programming language doesn't matter on this Q&A, I'll explain a possible approach using C# and the OP can translate it to any programming language.
For me, enforcing a no more than 100 results per query rule should be a cross-cutting concern. In opposite to what #plalx has said on his answer, I really believe that something that can be expressed in code is the way to go and it's not only an optimization concern, but a rule enforced to the entire solution.
Based on what I've said above, I would design a Repository abstract class to provide some common behaviors and rules across the entire solution:
public interface IRepository<T>
{
IList<T> List(int skip = 0, int take = 0);
// Other method definitions like Add, Remove, GetById...
}
public abstract class Repository<T> : IRepository<T>
{
protected virtual void EnsureValidPagination(int skip = 0, int take = 0)
{
if(take > 100)
{
throw new ArgumentException("take", "Cannot take more than 100 objects at once");
}
}
public IList<T> List(int skip = 0, int take = 0)
{
EnsureValidPagination(skip, take);
return DoList<T>(skip, take);
}
protected abstract IList<T> DoList(int skip = 0, int take = 0);
// Other methods like Add, Remove, GetById...
}
Now you would be able to call EnsureValidPagination in any implementation of IRepository<T> that would also inherit Repository<T>, whenever you implement an operation which involves returning object collections.
If you need to enforce such rule to some specific domain, you could just design another abstract class deriving some like I've described above, and introduce the whole rule there.
In my case, I always implement a solution-wide repository base class and I specialize it on each domain if needed, and I use it as base class to specific domain repository implementations.
Answering to some #guillaume31 comment/concern on his answer
I agree that it isn't a domain-specific rule. But Application and
Presentation aren't domain either. Repository is probably a bit too
sweeping and low-level for me -- what if a command line data utility
wants to fetch a vast number of items and still use the same domain
and persistence layers as the other applications?
Imagine you've defined a repository interface as follows:
public interface IProductRepository
{
IList<Product> List(int skip = 0, int take = 0);
}
An interface wouldn't define a limitation on how many products I can take at once, but see the following implementation to IProductRepository:
public class ProductRepository : IRepository
{
public ProductRepository(int defaultMaxListingResults = -1)
{
DefaultMaxListingResults = defaultMaxListingResults;
}
private int DefaultMaxListingResults { get; }
private void EnsureListingArguments(int skip = 0, int take = 0)
{
if(take > DefaultMaxListingResults)
{
throw new InvalidOperationException($"This repository can't take more results than {DefaultMaxListingResults} at once");
}
}
public IList<Product> List(int skip = 0, int take = 0)
{
EnsureListingArguments(skip, take);
}
}
Who said we need to harcode the maximum number of results that can be taken at once? If the same domain is consumed by different application layers I see you wouldn't be able to inject different constructor parameters depending on particular requirements by these application layers.
I see same service layer injecting exactly the same repository implementation with different configurations depending on the consumer of the whole domain.
Not a technical requirement at all
I want to throw my two cents on some consensus made by other answerers, which I believe that are partially right.
The consensus is a limitation like the one required by the OP is a technical requirement rather than a business requirement.
BTW, it seems like no one has put the focus on the fact that domains can talk to each other. That is, you don't design your domain and other layers to support the more traditional execution flow: data <-> data mapping <-> repository <-> service layer <-> application service <-> presentation (this is just a sample flow, it might be variants of it).
Domain should be bullet proof in all possible scenarios or use cases on which it'll be consumed or interacted. Hence, you should consider the following scenario: domain interactions.
We shouldn't be less philosophical and more ready to see the real world scenario, and the whole rule can happen in two ways:
The entire project isn't allowed to take more than 100 domain objects at once.
One or more domains aren't allowed to take more than 100 domain objects at once.
Some argue that we're talking about a technical requirement, but for me is a domain rule because it also enforces good coding practices. Why? Because I really think that, at the end of the day, there's no chance that you would want to get an entire domain object collection, because pagination has many flavors and one is the infinite scroll pagination which can be also be applied to command-line interfaces and simulate the feel of a get all operation. So, force your entire solution to do things right, and avoid get all operations, and probably the domain itself will be implemented differently than when there's no pagination limitation.
BTW, you should consider the following strategy: the domain enforces that you couldn't retrieve more than 100 domain objects, but any other layer on top of it can also define a limit lower than 100: you can't get more than 50 domain objects at once, otherwise the UI would suffer performance issues. This won't break the domain rule because the domain won't cry if you artificially limit what you can get within the range of its rule.
Probably in the Application layer, or even Presentation.
Choose Application if you want that rule to hold true for all front ends (web, mobile app, etc.), Presentation if the limit has to do with how much a specific device is able to display on screen at a time.
[Edit for clarification]
Judging by the other answers and comments, we're really talking about defensive programming to protect performance.
It cannot be in the Domain layer IMO because it's a programmer-to-programmer thing, not a domain requirement. When you talk to your railway domain expert, do they bring up or care about a maximum number of trains that can be taken out of any set of trains at a time? Probably not. It's not in the Ubiquitous Language.
Infrastructure layer (Repository implementation) is an option but as I said, I find it inconvenient and overly restrictive to control things at such a low level. Matías's proposed implementation of a parameterized Repository is admittedly an elegant solution though, because each application can specify their own maximum, so why not - if you really want to apply a broad sweeping limit on XRepository.GetAll() to a whole applicative space.