When it comes to designing the architecture of a system and the underlying services (consider a SOA), the database models can be designed it some ways, right... The general one is entity-based, which speaks for itself - the business logic is built around the entities (f.e. user, company, product). But when resource-based comes in the picture, it gets confusing. And the problem continues when I get results with very abstract or ambiguous information in google.
My focus here is on a CRM service (Customer Relations Management). But I deem it better for me to understand resource-based structure in general, in order to be able to design a service in such a way.
Can someone provide a concise explanation of resource-based structure and maybe compare it with entity-based?
You may find the most concise explanation in Fielding's dissertation on REST:
A resource is a conceptual mapping to a set of entities
If that's the context you were looking for, you can read more here
Related
Events in an event store (event sourcing) are most often persisted in a serialized format with versions to represent a changed in the model or schema for an event type. I haven't been able to find good documentation showing the actual model or schema for an actual event (often data table in event store schema if using a RDBMS) but understand that ideally it should be generic.
What are the most basic fields/properties that should exist in an event?
I've contemplated using json-api as a specification for my events but perhaps that's too "heavy". The benefits I see are flexibility and maturity.
Am I heading down the "wrong path"?
Any well defined examples would be greatly appreciated.
I've contemplated using json-api as a specification for my events but perhaps that's too "heavy". The benefits I see are flexibility and maturity.
Am I heading down the "wrong path"?
Don't overlook forward and backward compatibility.
You should plan to review Greg Young's book on event versioning; it doesn't directly answer your question, but it does cover a lot about the basics of interpreting an event.
Short answer: pretty much everything is optional, because you need to be able to change it later.
You should also review Hohpe's Enterprise Integration Patterns, in particular his work on messaging, which details a lot of cases you may care about.
de Graauw's Nobody Needs Reliable Messaging helped me to understan an important point.
To summarize: if reliability is important on the business level, do it on the business level.
So while there are some interesting bits of meta data tracking that you may want to do, the domain model is really only going to look at the data; and that is going to tend to be specific to your domain.
You also have the fun that the representation of events that you use in the service that produces them may not match the representation that it shares with other services, and in particular may not be the same message that gets broadcast.
I worked through an exercise trying to figure out what the minimum amount of information necessary for a subscriber to look at an event to understand if it cares. My answers were an id (have I seen this specific event before?), a token that tells you the semantic meaning of the message (is that something I care about?), and a location (URI) to get a richer representation if it is something I care about.
But outside of the domain -- for example, when you are looking at the system as a whole trying to figure out what is going on, having correlation identifiers and causation identifiers, time stamps, signatures of the source location, and so on stored in a consistent location in the meta data can be a big help.
Just modelling with basic types that map to Json to write as you would for an API can go a long way.
You can spend a lot of time generating overly complex models if you throw too much tooling at it - things like Apache Thrift and/or Protocol Buffers (or derived things) will provide all sorts of IDL mechanisms for you to generate incidental complexity with.
In .NET land and many other platforms, if you namespace the types you can do various projections from the types
Personally, I've used records and DUs in F# as a design and representation tool
you get intellisense, syntax hilighting, and types you can use from F# or C# for free
if someone wants to look, types.fs has all they need
Anyone who has experience with the Salesforce platform will know it can essentially be used as a backend for a lot of web applications. They let the end user define custom objects and the fields on those objects. So for instance, rather than having some entity as a strongly-typed class in the code, they have a generic "custom object", whose behaviour and data is defined by the fields you choose and the triggers and rules you apply to it. So they don't have to update the code, recompile and redeploy every time a user adds one (which, given they are a web service would be both impractical and cause serious downtime, a lot).
I was thinking how this could be implemented, and I think Salesforce may do it in a very complex way but I'm specifically thinking how I can implement this. So far I've come up with this:
An "object defintion", which contains all the metadata for a specific record type. Equivalent to a hardcoded class definition.
A generic "record", probably with some sort of dictionary/map tying values to field identifiers that exist in the object definition.
When operating on user data, both the record and the object defintion need to be in memory so that the integrity of the data can be checked. Behaviour normally provided by methods can be applied using some kind of trigger system (again, I'm using a Salesforce example here because it's the best example I know of) with defined actions/events.
This whole system seems very clunky, slow (without serious optimisation), and like it would be prone to problems which wouldn't plague 99% of software projects, so I'd like to learn more about it, but I have no idea where to start looking.
Is the idea I've laid out above already an existing paradigm and if so what is it called?
You have encountered the custom-fields. The design is to enable tenant specific fields against a fixed entity. Since multi-tenancy at the highest level demand That a single codebase / database be used for all tenants with the options to full Customization. This design is the best approach. The below link points to a patent That was granted for managing the custom-fields per tenant.
https://www.google.com/patents/US7779039
The model of the domain are my entities used as POCOs which means no base class, no interfaces around and no Attributes.
So the business logic like validation rules must be outside of the entities. (Anemic Domain Model)
Would this comply with Domain Driven Design?
No. Not really.
Main aim of domain driven design is to capture and encapsulate business domain in model explicitly as possible. Business would always contain behavior, therefore - Your objects are supposed to have behavior too.
The model of the domain are my entities used as POCOs which means no base class, no interfaces around and no Attributes.
...and no c#, .net clr should be used. that's infrastructure, right? ;)
Those are tools to express Your model. You should try to keep noise level down, separate Your model, but You won't be able to runaway completely because it's a model of real life expressed in programming language and technology around it.
Btw, You might want to investigate idea of never allowing domain object to be in invalid state. And if it feels that this particular kind of validation does not relate to business - it is not supposed to be in domain model first of all.
That's a really philosophical question. I really want to give an equally philosophical answer, so here goes:
As I have understood domain driven design, the most important thing is that whoever knows something, does things with that knowledge. I believe this to be intertwined with this article.
With this in mind, your plain old objects should have the means of performing their "life or death" - important tasks (which makes your solution wrong).
However, another way of looking at it would be that these plain old objects are the tiniest available sets of data, almost like primitives. What happens then is that the objects owning these data objects, they are the actual model objects within the domain driven design. and they don't have to correlate perfectly (which would make your solution correct).
This could easily happen if the model and the data layer are designed by two completely independent designers, or if one person is capable of switching hats. Or maybe be a little.... wohoooooo D: i'm thinking this could be a good thing though! let me give an example:
A forum
What do we need? We need users, boards, threads, and posts. The last 3 all have a "one to many" relationship in the data layer. One board has many threads, and one thread has many posts. One user also has many posts, and one user starts many threads (could be derived by finding the author of the first post in a thread, so might not have to be stored in the data layer). But what is going on in the presentation layer?
When viewing a board, we will want to see all available threads in that board. but we won't be satisfied with seeing the name of the thread, and the name of the user who started it. We also want to see the number of posts in each thread, plus the name of the last poster in the thread, and the time of that posting.
We are now looking at a model object which is somewhat out of sync with the data layer. It will contain business logic to calculate the needed data from the given data objects, and then it will be able to load some sort of view with the data that the view wants. No getters or setters will be needed in the model, so capsulation is never broken. The model object conforms to the domain, which should be dependant on the usability demands, not the limitations of data storage. The data objects conform to the old data storing style.
This would give us a data abstraction layer (with the pocos), mvc, and domain driven design. win? :)
So I was searching the web looking for best practices when implementing the repository pattern with multiple data stores when I found my entire way of looking at the problem turned upside down. Here's what I have...
My application is a BI tool pulling data from (as of now) four different databases. Due to internal constraints, I am currently using LINQ-to-SQL for data access but require a design that will allow me to change to Entity Framework or NHibernate or the next data access du jour. I also hold steadfast to decoupled layers in my apps using an IoC framework (Castle Windsor in this case).
As such, I've used the Repository pattern to abstract the actual data access code from my business layer. As a result, my business object is coded against some I<Entity>Repository interface and the IoC Container is used to manage the actual implementation. In this case, I would expect to have a concrete Linq<Entity>Repository that implements the interface using LINQ-to-SQL to do the work. Later I could replace this with an EF<Entity>Repository with no changes required to my business layer.
Also, because I'm coding against the interface, I can easily mock the repository for unit testing purposes.
So the first question that I have as I begin coding the application is whether I should have one repository per DataContext or per entity (as I've typically done)? Let's say one database contains Customers and Sales with the expected relationship. Should I have a single OrderTrackingRepository with methods that work with both entities or have a separate CustomerRepository and a different SalesRepository?
Next, as a BI tool, the primary interface is for reporting, charting, etc and often will require a "mashup" of data across multiple sources. For instance, the reality is that one database contains customer information while another handles sales information and a third holds other financial information but one of my requirements is to display aggregated information that spans all three. Plus, I have to support dynamic filtering in the UI. Obviously working directly against the LINQ-to-SQL or EF DataContext objects (Table<Entity>, for instance) will allow me to pretty much do anything. What's the best approach to expose that same functionality to my business logic when abstracting the DAL with a repository interface?
This article: link text indicates that EF4 has turned this approach around and that the repository is nothing more than an IQueryable returned from the EF DataContext which brings up a whole other set of questions.
But, I think I've rambled on enough...
UPDATE (Thanks, Steven!)
Okay, let me put a more tangible (for me, at least) example on the table and clarify a few points that will hopefully lead to an approach I can better wrap my head around.
While I understand what Steven has proposed, I have a team of developers I have to consider when implementing such things and I'm afraid they will get lost in the complexity (yes, a real problem here!).
So, let's remove any direct tie-in with Linq-to-Sql because I don't want a solution that is dependant upon the way L2S works - or even EF, for that matter. My intent has been to abstract away the data access technology being used so that I can change it as needed without requiring collateral changes to the consuming code in my business layer. I've accomplished this in the past by presenting the business layer with IRepository interfaces to work against. Perhaps these should have been named IUnitOfWork or, more to my liking, IDataService, but the goal is the same. These interfaces typically exposed methods such as Add, Remove, Contains and GetByKey, for example.
Here's my situation. I have three databases to work with. One is DB2 and contains all of the business information for a customer (franchise) such as their info and their Products, Orders, etc. Another, SQL Server database contains their financial history while a third SQL Server database contains application-specific information. The first two databases are shared by multiple applications.
Through my application, the customer may enter/upload their financial information for a given time period. When entered, I have to perform the following steps:
1.Validate the entered data against a set of static rules. For example, the data must contain a legitimate customer ID value (in the case of an upload). This requires a lookup in the DB2 database to verify that the supplied customer ID exists and is current.
2.Next I have to validate the data against a set of dynamic rules which are contained in the third (SQL Server) database. An example may be that a given value cannot exceed a certain percentage of another value.
3.Once validated, I persist the data to the second SQL Server database containing the financial data.
All the while, my code must have loosely-coupled dependencies so I may mock them in my unit tests.
As part of the analysis, I know that I have three distinct data stores to work with and about a half-dozen or so entities (at this time) that I am working with. In generic terms, I presume that I would have three DataContexts in my application, one per data store, with the entities exposed by the appropriate data context.
I could then create a separate I{repository|unit of work|service} for each entity that would be consumed by my business logic with a concrete implementation that knows which data context to use. But this seems to be a risky proposition as the number of entities increases, so does the number of individual repository|UoW|service types.
Then, take the case of my validation logic which works with multiple entities and, thereby, multiple data contexts. I'm not sure this is the most efficient way to do this.
The other requirement that I have yet to mention is on the reporting side where I will need to execute some complex queries on the data stores. As of right now, these queries will be limited to a single data store at a time, but the possibility is there that I might need to have the ability to mash data together from multiple sources.
Finally, I am considering the idea of pulling out all of the data access stuff for the first two (shared) databases into their own project and have been looking at WCF Data Services as a possible approach. This would give me the basis for a consistent approach for any application making use of this data.
How does this change your thinking?
In your case I would recommend returning IEnummerables's for your data queries for the repo. I usually aggregate calls from multiple repo's through a service class that represents the domain problem and encapsulates my business logic. To keep it clean I try keep my repros focused on the domain problem. I liken my Datacontext to a repo, and extract an interface using a T4 template to make life easier for mocking. But there is nothing stopping you using a traditional repo that encapsulates your calls. Doing it this way will allow you to switch ORM's at any stage.
EDIT: IQueryable IS NOT THE ANSWER! :-)
I have also done a lot of work in this area, and INITIALLY came to the same conclusion, however it is NOT a good solution. The point of the Repo is to abstract queries into discrete chunks of work. Exposing IQueryable is too adhoc and raises some issues later down the line. You loose your ability to scale. You loose your ability to optimize queries (Lets say I want to move to a highly optimized stored proc). You loose your ability to use IoC for the repo to switch out data access layers (switch the project from SQL to Mongo). You loose your ability to provide effective data caching in the Repo (Which is a major strength in the Repo pattern). I would recommend taking a CLOSE look as to WHY we have a Repo pattern. It isn't simply an "ORM" mapping layer. What made this really clear to me was the CQRS pattern.
Further to this allowing the ad-hoc nature of IQueryable opens you to misfitting reuse of queries. It is GENERALLY not a good idea to reuse queries, since query to query you see slight deviations, which ends up with 2 byproducts: Queries become too broad and inefficient. Queries become riddled with unmaintainable IF THEN statements to cater for the deviations.
IQueryable is easy, but opens you up to an unmaintainable mess.
Look at this SO answer. I think it shows a simplified model of what you want. IQueryable<T> is indeed our new Repository :-). DataContext and ObjectContext are our Unit of Work.
UPDATE 2:
Here is a blog post that describes the model you might be looking for.
UPDATE 3
It would be wise to hide the shared databases behind a service. This will solve several problems:
This will make the database private to the service, which makes it much easier to change the implementation when needed.
You can put the needed validation logic (for database 1) in that service and can create tests for that validation logic in that project.
Clients accessing that service can assume correctness of the service, and its validation logic.
The result of this is that your application will send data to the service to validate it. Call the service to fetch data. Query its own private database (database 3) and join the data of the three data source locally together. I've never been a fan of using cross-database or even cross-server (in your situation) database calls and letting the database join everything together. Transactions will be promoted to distributed-transactions and it's hard to predict how many data the servers will exchange.
When you abstract the shared databases behind the service, things get easier (at least from your application's point of view). Your application calls services it trusts which limits the amount of code in that application and the amount of tests. You still want to mock the calls to such a service, but that would be pretty easy. It should also solve the problem of validating over multiple data sources.
Validation is always a hard part. I'm very familiar with Validation Application block, and love it for it's flexibility. It isn't however an easy framework, but you might take a peek at what you can do with it. For instance, I've written several articles about integration with O/RM tools and how to 'embed' a context (context as in DataContext/Unit of Work) in Validation Application Block.
Please have a look at my IRepository pattern implementation using EF 4.0.
My solution has the following features:
supports connections to multiple dbs
One repository per entity
Support for execution of queries
Unit of work pattern implementation
Support for validating entities using VAB guidance
Common operations are kept at base class level. High use of OOPS techniques for code re-usability and ease of maintenance.
I have created an application (a web-based application) which now has a large number of associations. Simply put, an Account:
has many Users
has Settings
has many Projects
Similarly a Project:
has many Items
A User:
has many Tasks
And so on, with loads more associations. Nothing particularly unusual about that I hope. I chose to use NHibernate to give me a nice set of persistent classes with mappers defining all of the associations.
But is this approach right? On every request to the application the Account is loaded (because it is needed) and this then requests a large amount of data from the DB that is not required. Lazy loading is an option, but I don't know if my initial approach couldn't be better. All I want at that stage is the Account and associated Settings so should the mapping reflect this? Trouble is I want things like all the Projects for an Account at other points so I need the mappers to reflect all of the associations.
My concern is that lazy-loading may just be compensating for a bad initial architecture on my part. I am well aware that this may just be a result of my as-yet poor knowledge of the inner workings of NHibernate. As a result I'd be grateful for any suggestions on good practice.
Domain Driven Design has helped me a lot when understanding those kinds of situation. Association in DDD
The question you should ask yourself in your situation: Do you really need a bi-directional association - or might in some cases an uni-directional association be enough? And that decision is part of the architecture. So you were right. Lazy Loading is a big help, when choosing bi-directional associations by default. But it could be considered a design flaw.
It is generally good practise with NHibernate to enable lazy loading for most associations and objects by default unless you know you're always going to need the associated data. Then you can switch to eager loading selectively just for items where that is more efficient when you come to the optimisation stage.
Well, in my oppinion, your architecture looks healthy. Associations are good.
Consider the alternatives:
Less associations: Would lead to poor DB design
Don't use an ORM: You would find yourself doing "lazy initialization"-kinda coding anyway
Everything looks fine, and lazy initialization is good :) - However, there's quite a few real-life-usage pitfalls:
Don't close the session before using lazy-initialization stuff (would require that you "hacked" some useless reading statement on your association to force read it)
With NHibernate, you need to do all DAL-activities with it in order to enable the level2 cache
You'll prolly have some overhead (What if I don't wanna know the accountname, just the users in it)
This is how ORMs work. They have pro's (easier to develop, avoid a lot of boilerplate code) and con's (more overhead, less flexibility).