Need advice regarding layered solution, separation of concerns, etc - user-interface

We have a layered application, or at least is in the process of transitioning to one, broken down as follows:
Interface (user-interface or application-interface, ie. webservice, etc.)
Business logic
Data access
To make the rest of this question more concrete, I'll describe a specific instance.
We have a user interface, which has a controller object behind it (the business logic layer). This controller talks to the database through another object (the data access layer).
In a given context, the user interface allows the user to pick an employee to tie the operation being done to. Since there are rules regarding which employees the user (well, any world outside of the controller really) can pick, the controller provides two things for this:
a readable property containing a list of available employees to choose from
a read/writable property containing the currently chosen employee
The user interface might read the list and use it to populate a combobox.
In version 1 of this application, the combobox contains the identifying number of the employee + the name of the employee.
All is well...
... until version 1.1, a bugfix. A user complains that he can't choose between Jimmy Olson and Jimmy Olson because the application doesn't make it easy enough for him to know which is which. He knows there is one Jimmy in the sales department, and another in the development department, so the fix for this 1.1 version is to simply tack on a slash + the department name in the combobox. In version 2 we would opt for replacing the combobox with a combobox that has column support, removing the slash, but in 1.1, this is what is chosen in order to minimize the risk of further bugs.
In other words, the combobox would contain:
1 - Jimmy Olson/Sales
2 - Jimmy Olson/Development
other people
However, the user interface code has no SQL code, or any way to get hold of that department, and thus we have to go to the controller and look at the code there. The controller doesn't need the department, and to be honest, it doesn't even need the name of the employee, the identifying number is enough, so there is nothing in the controller that asks for or does anything to the department. So we have to go down to the data access layer and change the SQL there.
This solution quite frankly smells.
If there are multiple interfaces to this controller, with different requirements, we have three possible solutions:
Change the data access layer to cater for the (increasing/diverse) needs of multiple interfaces (2 layers away), which means that all the interfaces will possibly get all the data they need, but they would also get all the data required for any of the other interfaces
Add something that lets the user interface tell the data access layer (still 2 layers away) what it needs
Somehow make the user interface layer get hold of the required data without changing either the controller or access layer involved, this sounds like we need more data access code, somewhere.
None of the above solutions feel good.
What I'm wondering is, are we completely off course? How would you do this? Are there a fourth and fifth solution below the 3 above?
Per this question: Separation of Concerns, the accepted answer contains this quote:
The separation of concerns is keeping the code for each of these concerns separate. Changing the interface should not require changing the business logic code, and vice versa.
Does this simply mean that all the controller/data access layer should provide us with is whatever it needs to do its job (ie. the identifying numbers of employees), and then the user interface should go talk to the database and ask for more information about these specific employees?

The way I see it, you have two possibilities:
Have the lower layers send up all
the information they have about a
person, possibly as an XML document,
even though many of the consumers of
that information don't need it all.
Provide APIs for the higher level
layers to drill down and get the
information they need. So in the
case you give, have a method that
the interface can ask the business
layer to ask the database layer for
the department given the user id.
Both have trade-offs. The first one exposes a lot more information, possibly to consumers who don't have any right to that information. The second one passes a lot less information per transaction, but requires more transactions. The first one doesn't require a change to the API every time you have more information, but changes the XML. The second keeps the interface of existing APIs the same, but provides new APIs as needs change. And so on.

Related

Repositories (database gateway) in Clean Architecture

I have been studying Robert Martin's Clean Architecture. I loved this book, but when I saw some tutorial and code that is supposed to be an example of clean architecture, I saw a repository interface declared in the Entities (enterprise business rules) layer, with methods referencing an Entities class (which means the repository implementation in the adapters layers depends directly on Entities layer, bypassing application layer). Although in this sample codes the dependency rule is followed, i have some question about it:
Wouldn't a repository interface fit better in application layer, since persistence is more an application rule then a core rule about the entities?
In his book, and also in his blog article (search for the "What data crosses the boundaries" subtitle), Uncle bob says that only simple data structures should be passed across the boundaries, never entities. So are this sample codes wrong?
Wouldn't a repository interface fit better in application layer, since persistence is more an application rule then a core rule about the entities?
The repository interfaces are made to be used by interactors (aka. use cases). With the repository interface an interactor says what it wants but not how it is fulfilled. Thus they belong to the interactors and should therefore be placed in the use cases (application) layer.
I don't know the example you mentioned in your question, so I can't investigate it to see if there is some reason why the author put the repository interfaces in the entities layer and if it could have been solved in another way.
In his book, and also in his blog article (search for the "What data crosses the boundaries" subtitle), Uncle bob says that only simple data structures should be passed across the boundaries, never entities. So are this sample codes wrong?
Uncle Bob says
Typically the data that crosses the boundaries is simple data structures.
and later
For example, many database frameworks return a convenient data format in response to a query. We might call this a RowStructure. We don’t want to pass that row structure inwards across a boundary.
Let me explain why I highlighted typically.
So what he doesn't want in his architecture is that classes, or any types, that are defined in the database layer appear in the entities or the interactors. Because this would be a dependency from higher level modules to details.
Sometimes there are use cases that some developers want to implement more relaxed, but it comes at a price.
E.g. when you have a Show order details and you have loaded the complete order entity in your interactor, you usually have to copy almost all data to the data structure that passed the boundary (the output port) in order to strictly follow the architecture.
You could also pass the entity to the output port, but this would violate the strict architecture rules. So if you want to follow the strict rules you must copy the data to a new data structure, if not you pass the entity to the output port.
But before you pass the entity directly to the output port, you should consider the pros and cons.
If you pass the entity to the output port, ou have a dependency from the interface adapters layer to the entity layer. The dependency rule that each dependency should point inwards is still applied, but the dependency skips one layer.
Such a relaxed architecture introduces the risk that e.g. controllers might invoke business methods on entities and that is what they shouldn't do. It's up to you to decide if you (and your colleagues) are diciplined enough to not invoke business methods from the controller or if you better protect yourself and others from doing it by introducing a new data structure and do the copy effort.
From my point-of-view, which is a little more influenced by Domain-driven Design not only Clean Architecture, I see repositories as clear members of the domain layer (or business or core layer mostly referred to in clean architecture).
There are mainly two reasons for that:
1. Repositories are domain services that contain logic that has a meaning to business people.
Like finding an entity (aggregate) based on different criteria. For instance, a customer support employee, when on a support call, might have to search for an order either by the order number or if the customer does not have that they have to search by customer name and other criteria. The order repository would provide something like findOrderByCustomerName() which reflects the business language and needs. So repositories can contain more than just methods to retrieve the entity and save the entity. In some cases they can also provide a primitive value. For instance, a repository for incident tickets could give you the number of unresolved tickets for a specific product. Again, it needs to fulfill some business logic related purpose.
2. The domain (business) layer itself might need to consume a repository in order to perform some business logic.
If business logic does not fit with one single entity it is often the case that some kind of domain service is needed that contains the required logic. And this service containing business logic could need access to a repository to fulfill its tasks.
Let's say we have some company internal meeting room reservation application. If you want to make a new appointment there might be business logic that says that it must not be possible to schedule an appointment if there is already another appointment at the same time in the same meeting room. So the domain logic to ask if the requested meeting room is still free at that time might be part of the meeting room repository providing something like IsRoomFreeAt(RoomId roomId, DateTime requestedMeetingTime). And there could be a service in the domain layer executing all the related business logic which needs this repository's functionality. Even if you don't want to apply that query pattern and rather just search for the respective meeting room and do check the free status of that room instead, it would be the same result.
If the one approach or the other would be more suited for the respective application use case is another question. The important part here is that the business logic to make sure no rooms are double booked requires some information from a repository, no matter how.
If the repository would be located in the application layer the dependencies would no longer be pointing inwards because now the domain (business) layer would have a dependency on the application layer and not just the other way around.

Does this "overly general" type of programming have a name?

Anyone who has experience with the Salesforce platform will know it can essentially be used as a backend for a lot of web applications. They let the end user define custom objects and the fields on those objects. So for instance, rather than having some entity as a strongly-typed class in the code, they have a generic "custom object", whose behaviour and data is defined by the fields you choose and the triggers and rules you apply to it. So they don't have to update the code, recompile and redeploy every time a user adds one (which, given they are a web service would be both impractical and cause serious downtime, a lot).
I was thinking how this could be implemented, and I think Salesforce may do it in a very complex way but I'm specifically thinking how I can implement this. So far I've come up with this:
An "object defintion", which contains all the metadata for a specific record type. Equivalent to a hardcoded class definition.
A generic "record", probably with some sort of dictionary/map tying values to field identifiers that exist in the object definition.
When operating on user data, both the record and the object defintion need to be in memory so that the integrity of the data can be checked. Behaviour normally provided by methods can be applied using some kind of trigger system (again, I'm using a Salesforce example here because it's the best example I know of) with defined actions/events.
This whole system seems very clunky, slow (without serious optimisation), and like it would be prone to problems which wouldn't plague 99% of software projects, so I'd like to learn more about it, but I have no idea where to start looking.
Is the idea I've laid out above already an existing paradigm and if so what is it called?
You have encountered the custom-fields. The design is to enable tenant specific fields against a fixed entity. Since multi-tenancy at the highest level demand That a single codebase / database be used for all tenants with the options to full Customization. This design is the best approach. The below link points to a patent That was granted for managing the custom-fields per tenant.
https://www.google.com/patents/US7779039

MVC: Where should I put this data access logic?

I've been trying to adhere to a strict interpretation of MVC in rebuilding a personal web application at home, but I'm having some issues.
The application is a financial tracking application. I'm stuck at the first page, which is just a list of the bank account transactions for the month. The transaction list is my Model. The Controller and View are easy enough to envision at this point.
However, I also have two buttons at the top of the page. They are arrows, one corresponding to the previous month, and the other corresponding to the next month's transaction list. I don't want these buttons enabled if there are no transactions for the previous/next month, so I need to talk to the database to figure out whether each button should actually link somewhere.
From most of what I've read, to the largest extent possible database access should be encapsulated in models, with little to no database access in Controllers and Views.
However, these buttons essentially need to ask the database, "Are there any transactions for the next/previous months?" The answer will determine whether or not their links are disabled, as well as where to send the user.
Strictly speaking, putting their logic into the Transaction List model does not seem appropriate, as whether or not transactions exist outside the requested range is beyond the concern of the Transaction List Model.
I guess I could also make another Model that would correspond to all Year-Month combinations where transactions exist. I could then pass this Model and the Transaction List on to the proper View.
Or should I deviate from the no-database-access-outside-the-model paradigm, and just throw some quick DB querying into the View (since the result is essentially a UI concern, making navigation easier)?
What do you guys think about this conceptual issue?
BTW, I am not using any framework. This is pet project designed to get me more familiar with the nuts and bolts of the MVC pattern.
... should I deviate from the no-database-access-outside-the-model paradigm, and just throw some quick DB querying into the View ...
The short answer is: no. If you're truly interested in learning MVC and better coding in general, taking shortcuts "just because" is a very bad idea. Domain logic separation is a fundamental concept of MVC, and to violate it would essentially be moving out of MVC into territory that is dominated by terrible code and "MVC Frameworks." If you want to learn "the nuts and bolts of MVC," you'll have to understand that the nuts and bolts are things like separation of concerns, and they are very important.
A lot of the text in your question suggest a few mixed ideas about what a model should be, so I'm going to redirect you to the most linked-to answer regarding MVC models. That answer encompasses everything you'll want to know about the concept of models. To summarize the most important concept from it, a model is not a class, it's a layer. The model layer has many components that all do their own thing, which leaves you with a testable, extensible, semantic, and logical domain logic layer.
Just remember to be careful: There is way more mis-information out there about MVC than there is reliable information. The worst thing you can do to learn proper MVC is look inside of a framework that claims to be MVC.
edit: I assumed the language was PHP while responding, so some of my answer might reflect that assumption. However the concepts are applicable across nearly all languages
The application is a financial tracking application. I'm stuck at the first page, which is just a list of the bank account transactions for the month. The transaction list is my Model. The Controller and View are easy enough to envision at this point.
Everything on your web page, is categorised in the View domain. A "list" is a visual object.
For a web application/page, The model exists on your server, within your database. Everything that exists on the web page, (Buttons, Forms, List boxes etc) is you View domain.
From most of what I've read, to the largest extent possible database access should be encapsulated in models, with little to no database access in Controllers and Views.
The View objects traditionally talk directly to your Model objects. Which means you have to provide a way for your View objects to talk to your database.
An example,
http//server/transactions (Get all transactions)
http//server/transactions?from=x (Get transactions from x)
http//server/transactions?from=x&to=y (Get transactions from x to y)
Could be your interface to your Model by your View.

Does Domain Driven Design require to implement the business logic outside the domain objects.

The model of the domain are my entities used as POCOs which means no base class, no interfaces around and no Attributes.
So the business logic like validation rules must be outside of the entities. (Anemic Domain Model)
Would this comply with Domain Driven Design?
No. Not really.
Main aim of domain driven design is to capture and encapsulate business domain in model explicitly as possible. Business would always contain behavior, therefore - Your objects are supposed to have behavior too.
The model of the domain are my entities used as POCOs which means no base class, no interfaces around and no Attributes.
...and no c#, .net clr should be used. that's infrastructure, right? ;)
Those are tools to express Your model. You should try to keep noise level down, separate Your model, but You won't be able to runaway completely because it's a model of real life expressed in programming language and technology around it.
Btw, You might want to investigate idea of never allowing domain object to be in invalid state. And if it feels that this particular kind of validation does not relate to business - it is not supposed to be in domain model first of all.
That's a really philosophical question. I really want to give an equally philosophical answer, so here goes:
As I have understood domain driven design, the most important thing is that whoever knows something, does things with that knowledge. I believe this to be intertwined with this article.
With this in mind, your plain old objects should have the means of performing their "life or death" - important tasks (which makes your solution wrong).
However, another way of looking at it would be that these plain old objects are the tiniest available sets of data, almost like primitives. What happens then is that the objects owning these data objects, they are the actual model objects within the domain driven design. and they don't have to correlate perfectly (which would make your solution correct).
This could easily happen if the model and the data layer are designed by two completely independent designers, or if one person is capable of switching hats. Or maybe be a little.... wohoooooo D: i'm thinking this could be a good thing though! let me give an example:
A forum
What do we need? We need users, boards, threads, and posts. The last 3 all have a "one to many" relationship in the data layer. One board has many threads, and one thread has many posts. One user also has many posts, and one user starts many threads (could be derived by finding the author of the first post in a thread, so might not have to be stored in the data layer). But what is going on in the presentation layer?
When viewing a board, we will want to see all available threads in that board. but we won't be satisfied with seeing the name of the thread, and the name of the user who started it. We also want to see the number of posts in each thread, plus the name of the last poster in the thread, and the time of that posting.
We are now looking at a model object which is somewhat out of sync with the data layer. It will contain business logic to calculate the needed data from the given data objects, and then it will be able to load some sort of view with the data that the view wants. No getters or setters will be needed in the model, so capsulation is never broken. The model object conforms to the domain, which should be dependant on the usability demands, not the limitations of data storage. The data objects conform to the old data storing style.
This would give us a data abstraction layer (with the pocos), mvc, and domain driven design. win? :)

Repository pattern with "modern" data access strategies

So I was searching the web looking for best practices when implementing the repository pattern with multiple data stores when I found my entire way of looking at the problem turned upside down. Here's what I have...
My application is a BI tool pulling data from (as of now) four different databases. Due to internal constraints, I am currently using LINQ-to-SQL for data access but require a design that will allow me to change to Entity Framework or NHibernate or the next data access du jour. I also hold steadfast to decoupled layers in my apps using an IoC framework (Castle Windsor in this case).
As such, I've used the Repository pattern to abstract the actual data access code from my business layer. As a result, my business object is coded against some I<Entity>Repository interface and the IoC Container is used to manage the actual implementation. In this case, I would expect to have a concrete Linq<Entity>Repository that implements the interface using LINQ-to-SQL to do the work. Later I could replace this with an EF<Entity>Repository with no changes required to my business layer.
Also, because I'm coding against the interface, I can easily mock the repository for unit testing purposes.
So the first question that I have as I begin coding the application is whether I should have one repository per DataContext or per entity (as I've typically done)? Let's say one database contains Customers and Sales with the expected relationship. Should I have a single OrderTrackingRepository with methods that work with both entities or have a separate CustomerRepository and a different SalesRepository?
Next, as a BI tool, the primary interface is for reporting, charting, etc and often will require a "mashup" of data across multiple sources. For instance, the reality is that one database contains customer information while another handles sales information and a third holds other financial information but one of my requirements is to display aggregated information that spans all three. Plus, I have to support dynamic filtering in the UI. Obviously working directly against the LINQ-to-SQL or EF DataContext objects (Table<Entity>, for instance) will allow me to pretty much do anything. What's the best approach to expose that same functionality to my business logic when abstracting the DAL with a repository interface?
This article: link text indicates that EF4 has turned this approach around and that the repository is nothing more than an IQueryable returned from the EF DataContext which brings up a whole other set of questions.
But, I think I've rambled on enough...
UPDATE (Thanks, Steven!)
Okay, let me put a more tangible (for me, at least) example on the table and clarify a few points that will hopefully lead to an approach I can better wrap my head around.
While I understand what Steven has proposed, I have a team of developers I have to consider when implementing such things and I'm afraid they will get lost in the complexity (yes, a real problem here!).
So, let's remove any direct tie-in with Linq-to-Sql because I don't want a solution that is dependant upon the way L2S works - or even EF, for that matter. My intent has been to abstract away the data access technology being used so that I can change it as needed without requiring collateral changes to the consuming code in my business layer. I've accomplished this in the past by presenting the business layer with IRepository interfaces to work against. Perhaps these should have been named IUnitOfWork or, more to my liking, IDataService, but the goal is the same. These interfaces typically exposed methods such as Add, Remove, Contains and GetByKey, for example.
Here's my situation. I have three databases to work with. One is DB2 and contains all of the business information for a customer (franchise) such as their info and their Products, Orders, etc. Another, SQL Server database contains their financial history while a third SQL Server database contains application-specific information. The first two databases are shared by multiple applications.
Through my application, the customer may enter/upload their financial information for a given time period. When entered, I have to perform the following steps:
1.Validate the entered data against a set of static rules. For example, the data must contain a legitimate customer ID value (in the case of an upload). This requires a lookup in the DB2 database to verify that the supplied customer ID exists and is current.
2.Next I have to validate the data against a set of dynamic rules which are contained in the third (SQL Server) database. An example may be that a given value cannot exceed a certain percentage of another value.
3.Once validated, I persist the data to the second SQL Server database containing the financial data.
All the while, my code must have loosely-coupled dependencies so I may mock them in my unit tests.
As part of the analysis, I know that I have three distinct data stores to work with and about a half-dozen or so entities (at this time) that I am working with. In generic terms, I presume that I would have three DataContexts in my application, one per data store, with the entities exposed by the appropriate data context.
I could then create a separate I{repository|unit of work|service} for each entity that would be consumed by my business logic with a concrete implementation that knows which data context to use. But this seems to be a risky proposition as the number of entities increases, so does the number of individual repository|UoW|service types.
Then, take the case of my validation logic which works with multiple entities and, thereby, multiple data contexts. I'm not sure this is the most efficient way to do this.
The other requirement that I have yet to mention is on the reporting side where I will need to execute some complex queries on the data stores. As of right now, these queries will be limited to a single data store at a time, but the possibility is there that I might need to have the ability to mash data together from multiple sources.
Finally, I am considering the idea of pulling out all of the data access stuff for the first two (shared) databases into their own project and have been looking at WCF Data Services as a possible approach. This would give me the basis for a consistent approach for any application making use of this data.
How does this change your thinking?
In your case I would recommend returning IEnummerables's for your data queries for the repo. I usually aggregate calls from multiple repo's through a service class that represents the domain problem and encapsulates my business logic. To keep it clean I try keep my repros focused on the domain problem. I liken my Datacontext to a repo, and extract an interface using a T4 template to make life easier for mocking. But there is nothing stopping you using a traditional repo that encapsulates your calls. Doing it this way will allow you to switch ORM's at any stage.
EDIT: IQueryable IS NOT THE ANSWER! :-)
I have also done a lot of work in this area, and INITIALLY came to the same conclusion, however it is NOT a good solution. The point of the Repo is to abstract queries into discrete chunks of work. Exposing IQueryable is too adhoc and raises some issues later down the line. You loose your ability to scale. You loose your ability to optimize queries (Lets say I want to move to a highly optimized stored proc). You loose your ability to use IoC for the repo to switch out data access layers (switch the project from SQL to Mongo). You loose your ability to provide effective data caching in the Repo (Which is a major strength in the Repo pattern). I would recommend taking a CLOSE look as to WHY we have a Repo pattern. It isn't simply an "ORM" mapping layer. What made this really clear to me was the CQRS pattern.
Further to this allowing the ad-hoc nature of IQueryable opens you to misfitting reuse of queries. It is GENERALLY not a good idea to reuse queries, since query to query you see slight deviations, which ends up with 2 byproducts: Queries become too broad and inefficient. Queries become riddled with unmaintainable IF THEN statements to cater for the deviations.
IQueryable is easy, but opens you up to an unmaintainable mess.
Look at this SO answer. I think it shows a simplified model of what you want. IQueryable<T> is indeed our new Repository :-). DataContext and ObjectContext are our Unit of Work.
UPDATE 2:
Here is a blog post that describes the model you might be looking for.
UPDATE 3
It would be wise to hide the shared databases behind a service. This will solve several problems:
This will make the database private to the service, which makes it much easier to change the implementation when needed.
You can put the needed validation logic (for database 1) in that service and can create tests for that validation logic in that project.
Clients accessing that service can assume correctness of the service, and its validation logic.
The result of this is that your application will send data to the service to validate it. Call the service to fetch data. Query its own private database (database 3) and join the data of the three data source locally together. I've never been a fan of using cross-database or even cross-server (in your situation) database calls and letting the database join everything together. Transactions will be promoted to distributed-transactions and it's hard to predict how many data the servers will exchange.
When you abstract the shared databases behind the service, things get easier (at least from your application's point of view). Your application calls services it trusts which limits the amount of code in that application and the amount of tests. You still want to mock the calls to such a service, but that would be pretty easy. It should also solve the problem of validating over multiple data sources.
Validation is always a hard part. I'm very familiar with Validation Application block, and love it for it's flexibility. It isn't however an easy framework, but you might take a peek at what you can do with it. For instance, I've written several articles about integration with O/RM tools and how to 'embed' a context (context as in DataContext/Unit of Work) in Validation Application Block.
Please have a look at my IRepository pattern implementation using EF 4.0.
My solution has the following features:
supports connections to multiple dbs
One repository per entity
Support for execution of queries
Unit of work pattern implementation
Support for validating entities using VAB guidance
Common operations are kept at base class level. High use of OOPS techniques for code re-usability and ease of maintenance.

Resources