Repository Pattern - Caching - caching

I'm not sure where I should implement the caching in my repository pattern.
Should I implement it in the service-logic or in the repository?
GUI -> BusinessLogic (Services) -> DataAccess (Repositories)

It's a good idea not to put the caching logic directly into your repository, as that violates the Single Responsibility Principle (SRP) and Separation of Concerns. SRP essentially states that your classes should only have one reason to change. If you conflate the concerns of data access and caching policy in the same class, then if either of these needs to change you'll need to touch the class. You'll also probably find that you're violating the DRY principle, since it's easy to have caching logic spread out among many different repository methods, and if any of it needs to change, you end up having to change many methods.
The better approach is to use the Proxy or Strategy pattern to apply the caching logic in a separate type, for instance a CachedRepository, which then uses the actual db-centric repository as needed when the cache is empty. I've written two articles which demonstrate how to implement this using .NET/C#, which you will find on my blog, here:
http://ardalis.com/introducing-the-cachedrepository-pattern
http://ardalis.com/building-a-cachedrepository-via-strategy-pattern
If you prefer video, I also describe the pattern in the Proxy Design Pattern on Pluralsight, here:
https://app.pluralsight.com/library/courses/patterns-library/table-of-contents

I would handle it in the repository/data access layer. The reasoning is because it isn't up to the business layer on where to get the data from, that is the job of the repository. The repository will then decide where to get the data from, the cache (if it's not too old) or from the live data source based on the circumstances of the data access logic.
It's a data access concern more than a business logic issue.

Related

Repositories (database gateway) in Clean Architecture

I have been studying Robert Martin's Clean Architecture. I loved this book, but when I saw some tutorial and code that is supposed to be an example of clean architecture, I saw a repository interface declared in the Entities (enterprise business rules) layer, with methods referencing an Entities class (which means the repository implementation in the adapters layers depends directly on Entities layer, bypassing application layer). Although in this sample codes the dependency rule is followed, i have some question about it:
Wouldn't a repository interface fit better in application layer, since persistence is more an application rule then a core rule about the entities?
In his book, and also in his blog article (search for the "What data crosses the boundaries" subtitle), Uncle bob says that only simple data structures should be passed across the boundaries, never entities. So are this sample codes wrong?
Wouldn't a repository interface fit better in application layer, since persistence is more an application rule then a core rule about the entities?
The repository interfaces are made to be used by interactors (aka. use cases). With the repository interface an interactor says what it wants but not how it is fulfilled. Thus they belong to the interactors and should therefore be placed in the use cases (application) layer.
I don't know the example you mentioned in your question, so I can't investigate it to see if there is some reason why the author put the repository interfaces in the entities layer and if it could have been solved in another way.
In his book, and also in his blog article (search for the "What data crosses the boundaries" subtitle), Uncle bob says that only simple data structures should be passed across the boundaries, never entities. So are this sample codes wrong?
Uncle Bob says
Typically the data that crosses the boundaries is simple data structures.
and later
For example, many database frameworks return a convenient data format in response to a query. We might call this a RowStructure. We don’t want to pass that row structure inwards across a boundary.
Let me explain why I highlighted typically.
So what he doesn't want in his architecture is that classes, or any types, that are defined in the database layer appear in the entities or the interactors. Because this would be a dependency from higher level modules to details.
Sometimes there are use cases that some developers want to implement more relaxed, but it comes at a price.
E.g. when you have a Show order details and you have loaded the complete order entity in your interactor, you usually have to copy almost all data to the data structure that passed the boundary (the output port) in order to strictly follow the architecture.
You could also pass the entity to the output port, but this would violate the strict architecture rules. So if you want to follow the strict rules you must copy the data to a new data structure, if not you pass the entity to the output port.
But before you pass the entity directly to the output port, you should consider the pros and cons.
If you pass the entity to the output port, ou have a dependency from the interface adapters layer to the entity layer. The dependency rule that each dependency should point inwards is still applied, but the dependency skips one layer.
Such a relaxed architecture introduces the risk that e.g. controllers might invoke business methods on entities and that is what they shouldn't do. It's up to you to decide if you (and your colleagues) are diciplined enough to not invoke business methods from the controller or if you better protect yourself and others from doing it by introducing a new data structure and do the copy effort.
From my point-of-view, which is a little more influenced by Domain-driven Design not only Clean Architecture, I see repositories as clear members of the domain layer (or business or core layer mostly referred to in clean architecture).
There are mainly two reasons for that:
1. Repositories are domain services that contain logic that has a meaning to business people.
Like finding an entity (aggregate) based on different criteria. For instance, a customer support employee, when on a support call, might have to search for an order either by the order number or if the customer does not have that they have to search by customer name and other criteria. The order repository would provide something like findOrderByCustomerName() which reflects the business language and needs. So repositories can contain more than just methods to retrieve the entity and save the entity. In some cases they can also provide a primitive value. For instance, a repository for incident tickets could give you the number of unresolved tickets for a specific product. Again, it needs to fulfill some business logic related purpose.
2. The domain (business) layer itself might need to consume a repository in order to perform some business logic.
If business logic does not fit with one single entity it is often the case that some kind of domain service is needed that contains the required logic. And this service containing business logic could need access to a repository to fulfill its tasks.
Let's say we have some company internal meeting room reservation application. If you want to make a new appointment there might be business logic that says that it must not be possible to schedule an appointment if there is already another appointment at the same time in the same meeting room. So the domain logic to ask if the requested meeting room is still free at that time might be part of the meeting room repository providing something like IsRoomFreeAt(RoomId roomId, DateTime requestedMeetingTime). And there could be a service in the domain layer executing all the related business logic which needs this repository's functionality. Even if you don't want to apply that query pattern and rather just search for the respective meeting room and do check the free status of that room instead, it would be the same result.
If the one approach or the other would be more suited for the respective application use case is another question. The important part here is that the business logic to make sure no rooms are double booked requires some information from a repository, no matter how.
If the repository would be located in the application layer the dependencies would no longer be pointing inwards because now the domain (business) layer would have a dependency on the application layer and not just the other way around.

About dependency of Entity from Gateway in Clean Architecture

I have a question about Gateway to Entity dependency in Clean Architecture.
I think that the following concentric figures are often introduced as clean architecture.
In the figure above, Gateway does not look directly at the Entity.
However, there is another famous illustration of Clean Architecture. (2nd figure)
When it comes to actually creating a class, I think it will have the following structure.
Looking at this, the DataAccessInterface, which has the role of Gateway, refers to the Entity.
I think that the gateway in the first figure violates the rule that it does not refer to Entity. What does this mean? ??
Also, if Gateway refers to Entity, if you want to return your own error from Gateway, you need to put your own error definition in Entity. The original error is an error specific to the application, and I feel that it does not match the domain logic. Is such a definition method correct for clean architecture? ??
The most important thing is that dependencies point inwards. That means, that the layer containing the most precious business logic should not have dependencies to the outer layers.
If the Gateway works directly with some entity from the domain (or logic layer, in the diagram you're referencing named "Entities") and has a good reason to directly access it and bypass the application layer altogether, this can be a valid and pragmatic decision.
Sometimes there is no complex application logic that might justify introducing the indirection in the application layer. However, it is good to keep an eye on such decisions and make sure the application is being refactored when the time comes that an application service that translates from the Gateway to the entity layer is making sense. Just make sure you work with what is provided by the entity (business) layer to fit the business logic and invariants. As soon as you would have to change something in your entity just to fit some needs of the outer layers this is a smell and some indirection (such as the application layer in between translating between the two layers) is definitely needed.
Concerning the error it is a similar decision. As long as the entity (in the domain
logic layer) does only have to know about errors that are meaningful from the business perspective it could be to okay directly use that error if it fits the needs of the "outside" world the Gateway is talking to
So again, it is a decision depending on your situation. Just keep in mind that it is important to NOT extend the entities in the business/domain logic layer just to fit any infrastructure related needs, such as providing different structuring of error information or additional technical details that have no meaning in the business world.

Where to put such business logic? Service vs DAO?

Given:
Spring MVC - Hibernate.
Controller -> Service -> DAO
Now I have a method which retrieves something from the DB and EVERYTIME it does this, has to do another method say "processList" (something like just changing some values in the list depending on some screen parameters).
Question:
What layer do I put this "processList"? (Controller, Service or DAO? and why)
I really need some j2ee clarifications now, I know that MVC is the same across languages but I just need to be sure :) If I am doing this in .net I would have undoubtedly put this in service.
This really depends on what processList is doing exactly. There is no golden rule. Some rules I try to follow are:
Never make calls between main objects on the same layer.
ManagementServiceImpl should never call NotificationServiceImpl.
Don't make circular dependencies between objects.
This is very similar to the one above.
If you find yourself repeating some logic across multiple main object, try to restructure the code and extract it in specialized logical classes (this will improve unit-testing as well).
E.g. UserUpdateHandler or NotificationDispatcher (these are still owned by the service layer -> noone else is allowed to call them)...
Put the code where it logically belongs.
Don't get distracted by the fact, that some class needs to do something. It might not be the right place for the code.
Never write fully generalised code before you need to.
This has its term as premature generalisation, which is a bad practice similar to premature optimisation. Saving few lines of code now can lead to pulling your hair out in the future.
Always write code which is able to become generalised.
This is not a contradiction with the previous. This says - always write with generalisation in mind, however don't bother with writing if it is not needed. Think ahead, but not necessarily act ahead.
Leave business logic to service layer, data persistence logic to data layer and user interaction logic to presentation layer.
Don't try to parse user input in service layer. This does not belong there similarly as counting final price in e-shop application does not belong to presentation layer.
Some examples for processList:
Example I - fetching additional relations via Hibernate#initialize
This is something which is really in between the service and DAO layer. On older projects we had specialized FetchHandler class (owned by service layer). In newer projects we leave this completely to DAOs.
Example II - go through list and add business validation messages to the result
service layer, no doubt
Example III - go through the list and prepare UI messages based on the validation errors
presentation layer
Side note:
MVC is a different thing from three-layered architecture.
M model spans across all three layers. Presentation layer includes both V views and C controllers.

Repository pattern with "modern" data access strategies

So I was searching the web looking for best practices when implementing the repository pattern with multiple data stores when I found my entire way of looking at the problem turned upside down. Here's what I have...
My application is a BI tool pulling data from (as of now) four different databases. Due to internal constraints, I am currently using LINQ-to-SQL for data access but require a design that will allow me to change to Entity Framework or NHibernate or the next data access du jour. I also hold steadfast to decoupled layers in my apps using an IoC framework (Castle Windsor in this case).
As such, I've used the Repository pattern to abstract the actual data access code from my business layer. As a result, my business object is coded against some I<Entity>Repository interface and the IoC Container is used to manage the actual implementation. In this case, I would expect to have a concrete Linq<Entity>Repository that implements the interface using LINQ-to-SQL to do the work. Later I could replace this with an EF<Entity>Repository with no changes required to my business layer.
Also, because I'm coding against the interface, I can easily mock the repository for unit testing purposes.
So the first question that I have as I begin coding the application is whether I should have one repository per DataContext or per entity (as I've typically done)? Let's say one database contains Customers and Sales with the expected relationship. Should I have a single OrderTrackingRepository with methods that work with both entities or have a separate CustomerRepository and a different SalesRepository?
Next, as a BI tool, the primary interface is for reporting, charting, etc and often will require a "mashup" of data across multiple sources. For instance, the reality is that one database contains customer information while another handles sales information and a third holds other financial information but one of my requirements is to display aggregated information that spans all three. Plus, I have to support dynamic filtering in the UI. Obviously working directly against the LINQ-to-SQL or EF DataContext objects (Table<Entity>, for instance) will allow me to pretty much do anything. What's the best approach to expose that same functionality to my business logic when abstracting the DAL with a repository interface?
This article: link text indicates that EF4 has turned this approach around and that the repository is nothing more than an IQueryable returned from the EF DataContext which brings up a whole other set of questions.
But, I think I've rambled on enough...
UPDATE (Thanks, Steven!)
Okay, let me put a more tangible (for me, at least) example on the table and clarify a few points that will hopefully lead to an approach I can better wrap my head around.
While I understand what Steven has proposed, I have a team of developers I have to consider when implementing such things and I'm afraid they will get lost in the complexity (yes, a real problem here!).
So, let's remove any direct tie-in with Linq-to-Sql because I don't want a solution that is dependant upon the way L2S works - or even EF, for that matter. My intent has been to abstract away the data access technology being used so that I can change it as needed without requiring collateral changes to the consuming code in my business layer. I've accomplished this in the past by presenting the business layer with IRepository interfaces to work against. Perhaps these should have been named IUnitOfWork or, more to my liking, IDataService, but the goal is the same. These interfaces typically exposed methods such as Add, Remove, Contains and GetByKey, for example.
Here's my situation. I have three databases to work with. One is DB2 and contains all of the business information for a customer (franchise) such as their info and their Products, Orders, etc. Another, SQL Server database contains their financial history while a third SQL Server database contains application-specific information. The first two databases are shared by multiple applications.
Through my application, the customer may enter/upload their financial information for a given time period. When entered, I have to perform the following steps:
1.Validate the entered data against a set of static rules. For example, the data must contain a legitimate customer ID value (in the case of an upload). This requires a lookup in the DB2 database to verify that the supplied customer ID exists and is current.
2.Next I have to validate the data against a set of dynamic rules which are contained in the third (SQL Server) database. An example may be that a given value cannot exceed a certain percentage of another value.
3.Once validated, I persist the data to the second SQL Server database containing the financial data.
All the while, my code must have loosely-coupled dependencies so I may mock them in my unit tests.
As part of the analysis, I know that I have three distinct data stores to work with and about a half-dozen or so entities (at this time) that I am working with. In generic terms, I presume that I would have three DataContexts in my application, one per data store, with the entities exposed by the appropriate data context.
I could then create a separate I{repository|unit of work|service} for each entity that would be consumed by my business logic with a concrete implementation that knows which data context to use. But this seems to be a risky proposition as the number of entities increases, so does the number of individual repository|UoW|service types.
Then, take the case of my validation logic which works with multiple entities and, thereby, multiple data contexts. I'm not sure this is the most efficient way to do this.
The other requirement that I have yet to mention is on the reporting side where I will need to execute some complex queries on the data stores. As of right now, these queries will be limited to a single data store at a time, but the possibility is there that I might need to have the ability to mash data together from multiple sources.
Finally, I am considering the idea of pulling out all of the data access stuff for the first two (shared) databases into their own project and have been looking at WCF Data Services as a possible approach. This would give me the basis for a consistent approach for any application making use of this data.
How does this change your thinking?
In your case I would recommend returning IEnummerables's for your data queries for the repo. I usually aggregate calls from multiple repo's through a service class that represents the domain problem and encapsulates my business logic. To keep it clean I try keep my repros focused on the domain problem. I liken my Datacontext to a repo, and extract an interface using a T4 template to make life easier for mocking. But there is nothing stopping you using a traditional repo that encapsulates your calls. Doing it this way will allow you to switch ORM's at any stage.
EDIT: IQueryable IS NOT THE ANSWER! :-)
I have also done a lot of work in this area, and INITIALLY came to the same conclusion, however it is NOT a good solution. The point of the Repo is to abstract queries into discrete chunks of work. Exposing IQueryable is too adhoc and raises some issues later down the line. You loose your ability to scale. You loose your ability to optimize queries (Lets say I want to move to a highly optimized stored proc). You loose your ability to use IoC for the repo to switch out data access layers (switch the project from SQL to Mongo). You loose your ability to provide effective data caching in the Repo (Which is a major strength in the Repo pattern). I would recommend taking a CLOSE look as to WHY we have a Repo pattern. It isn't simply an "ORM" mapping layer. What made this really clear to me was the CQRS pattern.
Further to this allowing the ad-hoc nature of IQueryable opens you to misfitting reuse of queries. It is GENERALLY not a good idea to reuse queries, since query to query you see slight deviations, which ends up with 2 byproducts: Queries become too broad and inefficient. Queries become riddled with unmaintainable IF THEN statements to cater for the deviations.
IQueryable is easy, but opens you up to an unmaintainable mess.
Look at this SO answer. I think it shows a simplified model of what you want. IQueryable<T> is indeed our new Repository :-). DataContext and ObjectContext are our Unit of Work.
UPDATE 2:
Here is a blog post that describes the model you might be looking for.
UPDATE 3
It would be wise to hide the shared databases behind a service. This will solve several problems:
This will make the database private to the service, which makes it much easier to change the implementation when needed.
You can put the needed validation logic (for database 1) in that service and can create tests for that validation logic in that project.
Clients accessing that service can assume correctness of the service, and its validation logic.
The result of this is that your application will send data to the service to validate it. Call the service to fetch data. Query its own private database (database 3) and join the data of the three data source locally together. I've never been a fan of using cross-database or even cross-server (in your situation) database calls and letting the database join everything together. Transactions will be promoted to distributed-transactions and it's hard to predict how many data the servers will exchange.
When you abstract the shared databases behind the service, things get easier (at least from your application's point of view). Your application calls services it trusts which limits the amount of code in that application and the amount of tests. You still want to mock the calls to such a service, but that would be pretty easy. It should also solve the problem of validating over multiple data sources.
Validation is always a hard part. I'm very familiar with Validation Application block, and love it for it's flexibility. It isn't however an easy framework, but you might take a peek at what you can do with it. For instance, I've written several articles about integration with O/RM tools and how to 'embed' a context (context as in DataContext/Unit of Work) in Validation Application Block.
Please have a look at my IRepository pattern implementation using EF 4.0.
My solution has the following features:
supports connections to multiple dbs
One repository per entity
Support for execution of queries
Unit of work pattern implementation
Support for validating entities using VAB guidance
Common operations are kept at base class level. High use of OOPS techniques for code re-usability and ease of maintenance.

Interface Insanity

I'm drinking the coolade and loving it - interfaces, IoC, DI, TDD, etc. etc. Working out pretty well. But I'm finding I have to fight a tendency to make everything an interface! I have a factory which is an interface. Its methods return objects which could be interfaces (might make testing easier). Those objects are DI'ed interfaces to the services they need. What I'm finding is that keeping the interfaces in sync with the implementations is adding to the work - adding a method to a class means adding it to the class + the interface, mocks, etc.
Am I factoring the interfaces out too early? Are there best practices to know when something should return an interface vs. an object?
Interfaces are useful when you want to mock an interaction between an object and one of its collaborators. However there is less value in an interface for an object which has internal state.
For example, say I have a service which talks to a repository in order to extract some domain object in order to manipulate it in some way.
There is definite design value in extracting an interface from the repository. My concrete implementation of the repository may well be strongly linked to NHibernate or ActiveRecord. By linking my service to the interface I get a clean separation from this implementation detail. It just so happens that I can also write super fast standalone unit tests for my service now that I can hand it a mock IRepository.
Considering the domain object which came back from the repository and which my service acts upon, there is less value. When I write test for my service, I will want to use a real domain object and check its state. E.g. after the call to service.AddSomething() I want to check that something was added to the domain object. I can test this by simple inspection of the state of the domain object. When I test my domain object in isolation, I don't need interfaces as I am only going to perform operations on the object and quiz it on its internal state. e.g. is it valid for my sheep to eat grass if it is sleeping?
In the first case, we are interested in interaction based testing. Interfaces help because we want to intercept the calls passing between the object under test and its collaborators with mocks. In the second case we are interested in state based testing. Interfaces don't help here. Try to be conscious of whether you are testing state or interactions and let that influence your interface or no interface decision.
Remember that (providing you have a copy of Resharper installed) it is extremely cheap to extract an interface later. It is also cheap to delete the interface and revert to a simpler class hierarchy if you decide that you didn't need that interface after all. My advice would be to start without interfaces and extract them on demand when you find that you want to mock the interaction.
When you bring IoC into the picture, then I would tend to extract more interfaces - but try to keep a lid on how many classes you shove into your IoC container. In general, you want to keep these restricted to largely stateless service objects.
Sounds like you're suffering a little from BDUF.
Take it easy with the coolade and let it flow naturally.
Remember that while flexibility is a worthy objective, added flexibility with IoC and DI (which to some extent are requirements for TDD) also increases complexity. The only point of flexibility is to make changes downstream quicker, cheaper or better. Each IoC/DI point increases complexity, and thus contributes to making changes elsewhere more difficult.
This is actually where you need a Big Design Up Front to some extent: identify what areas are most likely to change (and/or need extensive unit testing), and plan for flexibility there. Refactor to eliminate flexibility where changes are unlikely.
Now, I'm not saying that you can guess where flexibility will be needed with any kind of accuracy. You'll be wrong. But it's likely that you'll get something right. Where you later find you don't need flexibility, it can be factored out in maintenance. Where you need it, it can be factored in when adding features.
Now, areas which may or may not change depends on your business problem and IT environment. Here are some recurring areas.
I'd always consider external
interfaces where you integrate to
other systems to be highly mutable.
Whatever code provides a back end to
the user interface will need to support change in the UI. However, plan for changes in functionality primarily: don't go overboard and plan for different UI technologies (such as supporting both a smart client and a web application – usage patterns will differ too much).
On the other hand, coding for
portability to different databases
and platforms is usually a waste
of time at least in corporate
environments. Ask around and check
what plans may exist to replace or
upgrade technologies within the
likely lifespan of your software.
Changes to data content and formats are a tricky
business: while data will
occasionally change, most designs
I've seen handle such changes
poorly, and thus you get concrete
entity classes used directly.
But only you can make the judgement of what might or should not change.
I usually find that I want interfaces for "services" - whereas types which are primarily about "data" can be concrete classes. For instance, I'd have an Authenticator interface, but a Contact class. Of course, it's not always that clear-cut, but it's an initial rule of thumb.
I do feel your pain though - it's a little bit like going back to the dark days of .h and .c files...
I think the most important "agile" principle is YAGNI ("You Ain't Gonna Need It"). In other words, don't write extra code until it's actually needed, because if you write it in advance the requirements and constraints may well have changed when(if!) you finally do need it.
Interfaces, dependency injections, etc. - all this stuff adds complexity to your code, making it harder to understand and change. My rule of thumb is to keep things as simple as possible (but no simpler) and to not add complexity unless it gains me more than enough to offset the burden it imposes.
So if you are actually testing and having a mock object would be very useful then by all means define an interface that both your mock and real classes implement. But don't create a bunch of interfaces on the purely hypothetical grounds that it might be useful at some point, or that it is "correct" OO design.
Interfaces have the purpose of establishing a contract and are particularly useful when you want to change the class performing a task on the fly. If there is no need to change the classes an interface just might get in the way.
There are automatic tools to extract the interface, so maybe you'd better delay the interface extraction later in the process.
It depends very much on what you are providing... if you are working on internal things then the advice of "don't do them until needed" is reasonable. If, however, you are making an API that is to be consumed by other developers then changing things around to interfaces at a later date can be annoying.
A good rule of thumb is to make interfaces out of anything that needs to be subclasses. This is not an "always make an interface in that case" sort of thing, you still need to think about it.
So, the short answer is (and this works with both internal things and providing an API) is that if you anticipate more than one implementation is going to be needed then make it an interface.
Somethings that generally would not be interfaces would be classes that only hold data, like say a Location class that deals with x and y. THe odds of there being another implementation of that is slim.
Do not create the interfaces first - you ain't gonna need them. You cannot guess for which classes you'll need an interface of for which classes you don't. Therefore do not spend any time to burden the code with useless interface now.
But extract them when you feel the urge to do so - when you see the need of an interface - at the refactoring step.
Those answers can help too.

Resources