Does this "overly general" type of programming have a name? - multi-tenant

Anyone who has experience with the Salesforce platform will know it can essentially be used as a backend for a lot of web applications. They let the end user define custom objects and the fields on those objects. So for instance, rather than having some entity as a strongly-typed class in the code, they have a generic "custom object", whose behaviour and data is defined by the fields you choose and the triggers and rules you apply to it. So they don't have to update the code, recompile and redeploy every time a user adds one (which, given they are a web service would be both impractical and cause serious downtime, a lot).
I was thinking how this could be implemented, and I think Salesforce may do it in a very complex way but I'm specifically thinking how I can implement this. So far I've come up with this:
An "object defintion", which contains all the metadata for a specific record type. Equivalent to a hardcoded class definition.
A generic "record", probably with some sort of dictionary/map tying values to field identifiers that exist in the object definition.
When operating on user data, both the record and the object defintion need to be in memory so that the integrity of the data can be checked. Behaviour normally provided by methods can be applied using some kind of trigger system (again, I'm using a Salesforce example here because it's the best example I know of) with defined actions/events.
This whole system seems very clunky, slow (without serious optimisation), and like it would be prone to problems which wouldn't plague 99% of software projects, so I'd like to learn more about it, but I have no idea where to start looking.
Is the idea I've laid out above already an existing paradigm and if so what is it called?

You have encountered the custom-fields. The design is to enable tenant specific fields against a fixed entity. Since multi-tenancy at the highest level demand That a single codebase / database be used for all tenants with the options to full Customization. This design is the best approach. The below link points to a patent That was granted for managing the custom-fields per tenant.
https://www.google.com/patents/US7779039

Related

How to validate from aggregate

I am trying to understand validation from the aggregate entity on the command side of the CQRS pattern when using eventsourcing.
Basically I would like to know what the best practice is in handling validation for:
1. Uniqueness of say a code.
2. The correctness/validation of an eternal aggregate's id.
My initial thoughts:
I've thought about the constructor passing in a service but this seems wrong as the "Create" of the entity should be the values to assign.
I've thought about validation outside the aggregate, but this seem to put logic somewhere that I assume should be responsibility of the aggregate itself.
Can anyone give me some guidance here?
Uniqueness of say a code.
Ensuring uniqueness is a specific example of set validation. The problem with set validation is that, in effect, you perform the check by locking the entire set. If the entire set is included within a single "aggregate", that's easily done. But if the set spans aggregates, then it is kind of a mess.
A common solution for uniqueness is to manage it at the database level; RDBMS are really good at set operations, and are effectively serialized. Unfortunately, that locks you into a database solution with good set support -- you can't easily switch to a document database, or an event store.
Another approach that is sometimes appropriate is to have the single aggregate check for uniqueness against a cached copy of the available codes. That gives you more freedom to choose your storage solution, but it also opens up the possibility that a data race will introduce the duplication you are trying to avoid.
In some cases, you can encode the code uniqueness into the identifier for the aggregate. In effect, every identifier becomes a set of one.
Keep in mind Greg Young's question
What is the business impact of having a failure?
Knowing how expensive a failure is tells you a lot about how much you are permitted to spend to solve the problem.
The correctness/validation of an eternal aggregate's id.
This normally comes in two parts. The easier one is to validate the data against some agreed upon schema. If our agreement is that the identifier is going to be a URI, then I can validate that the data I receive does satisfy that constraint. Similarly, if the identifier is supposed to be a string representation of a UUID, I can test that the data I receive matches the validation rules described in RFC 4122.
But if you need to check that the identifier is in use somewhere else? Then you are going to have to ask.... The main question in this case is whether you need the answer to that right away, or if you can manage to check that asynchronously (for instance, by modeling "unverified identifiers" and "verified identifiers" separately).
And of course you once again get to reconcile all of the races inherent in distributed computing.
There is no magic.

Event model or schema in event store

Events in an event store (event sourcing) are most often persisted in a serialized format with versions to represent a changed in the model or schema for an event type. I haven't been able to find good documentation showing the actual model or schema for an actual event (often data table in event store schema if using a RDBMS) but understand that ideally it should be generic.
What are the most basic fields/properties that should exist in an event?
I've contemplated using json-api as a specification for my events but perhaps that's too "heavy". The benefits I see are flexibility and maturity.
Am I heading down the "wrong path"?
Any well defined examples would be greatly appreciated.
I've contemplated using json-api as a specification for my events but perhaps that's too "heavy". The benefits I see are flexibility and maturity.
Am I heading down the "wrong path"?
Don't overlook forward and backward compatibility.
You should plan to review Greg Young's book on event versioning; it doesn't directly answer your question, but it does cover a lot about the basics of interpreting an event.
Short answer: pretty much everything is optional, because you need to be able to change it later.
You should also review Hohpe's Enterprise Integration Patterns, in particular his work on messaging, which details a lot of cases you may care about.
de Graauw's Nobody Needs Reliable Messaging helped me to understan an important point.
To summarize: if reliability is important on the business level, do it on the business level.
So while there are some interesting bits of meta data tracking that you may want to do, the domain model is really only going to look at the data; and that is going to tend to be specific to your domain.
You also have the fun that the representation of events that you use in the service that produces them may not match the representation that it shares with other services, and in particular may not be the same message that gets broadcast.
I worked through an exercise trying to figure out what the minimum amount of information necessary for a subscriber to look at an event to understand if it cares. My answers were an id (have I seen this specific event before?), a token that tells you the semantic meaning of the message (is that something I care about?), and a location (URI) to get a richer representation if it is something I care about.
But outside of the domain -- for example, when you are looking at the system as a whole trying to figure out what is going on, having correlation identifiers and causation identifiers, time stamps, signatures of the source location, and so on stored in a consistent location in the meta data can be a big help.
Just modelling with basic types that map to Json to write as you would for an API can go a long way.
You can spend a lot of time generating overly complex models if you throw too much tooling at it - things like Apache Thrift and/or Protocol Buffers (or derived things) will provide all sorts of IDL mechanisms for you to generate incidental complexity with.
In .NET land and many other platforms, if you namespace the types you can do various projections from the types
Personally, I've used records and DUs in F# as a design and representation tool
you get intellisense, syntax hilighting, and types you can use from F# or C# for free
if someone wants to look, types.fs has all they need

Repository pattern with "modern" data access strategies

So I was searching the web looking for best practices when implementing the repository pattern with multiple data stores when I found my entire way of looking at the problem turned upside down. Here's what I have...
My application is a BI tool pulling data from (as of now) four different databases. Due to internal constraints, I am currently using LINQ-to-SQL for data access but require a design that will allow me to change to Entity Framework or NHibernate or the next data access du jour. I also hold steadfast to decoupled layers in my apps using an IoC framework (Castle Windsor in this case).
As such, I've used the Repository pattern to abstract the actual data access code from my business layer. As a result, my business object is coded against some I<Entity>Repository interface and the IoC Container is used to manage the actual implementation. In this case, I would expect to have a concrete Linq<Entity>Repository that implements the interface using LINQ-to-SQL to do the work. Later I could replace this with an EF<Entity>Repository with no changes required to my business layer.
Also, because I'm coding against the interface, I can easily mock the repository for unit testing purposes.
So the first question that I have as I begin coding the application is whether I should have one repository per DataContext or per entity (as I've typically done)? Let's say one database contains Customers and Sales with the expected relationship. Should I have a single OrderTrackingRepository with methods that work with both entities or have a separate CustomerRepository and a different SalesRepository?
Next, as a BI tool, the primary interface is for reporting, charting, etc and often will require a "mashup" of data across multiple sources. For instance, the reality is that one database contains customer information while another handles sales information and a third holds other financial information but one of my requirements is to display aggregated information that spans all three. Plus, I have to support dynamic filtering in the UI. Obviously working directly against the LINQ-to-SQL or EF DataContext objects (Table<Entity>, for instance) will allow me to pretty much do anything. What's the best approach to expose that same functionality to my business logic when abstracting the DAL with a repository interface?
This article: link text indicates that EF4 has turned this approach around and that the repository is nothing more than an IQueryable returned from the EF DataContext which brings up a whole other set of questions.
But, I think I've rambled on enough...
UPDATE (Thanks, Steven!)
Okay, let me put a more tangible (for me, at least) example on the table and clarify a few points that will hopefully lead to an approach I can better wrap my head around.
While I understand what Steven has proposed, I have a team of developers I have to consider when implementing such things and I'm afraid they will get lost in the complexity (yes, a real problem here!).
So, let's remove any direct tie-in with Linq-to-Sql because I don't want a solution that is dependant upon the way L2S works - or even EF, for that matter. My intent has been to abstract away the data access technology being used so that I can change it as needed without requiring collateral changes to the consuming code in my business layer. I've accomplished this in the past by presenting the business layer with IRepository interfaces to work against. Perhaps these should have been named IUnitOfWork or, more to my liking, IDataService, but the goal is the same. These interfaces typically exposed methods such as Add, Remove, Contains and GetByKey, for example.
Here's my situation. I have three databases to work with. One is DB2 and contains all of the business information for a customer (franchise) such as their info and their Products, Orders, etc. Another, SQL Server database contains their financial history while a third SQL Server database contains application-specific information. The first two databases are shared by multiple applications.
Through my application, the customer may enter/upload their financial information for a given time period. When entered, I have to perform the following steps:
1.Validate the entered data against a set of static rules. For example, the data must contain a legitimate customer ID value (in the case of an upload). This requires a lookup in the DB2 database to verify that the supplied customer ID exists and is current.
2.Next I have to validate the data against a set of dynamic rules which are contained in the third (SQL Server) database. An example may be that a given value cannot exceed a certain percentage of another value.
3.Once validated, I persist the data to the second SQL Server database containing the financial data.
All the while, my code must have loosely-coupled dependencies so I may mock them in my unit tests.
As part of the analysis, I know that I have three distinct data stores to work with and about a half-dozen or so entities (at this time) that I am working with. In generic terms, I presume that I would have three DataContexts in my application, one per data store, with the entities exposed by the appropriate data context.
I could then create a separate I{repository|unit of work|service} for each entity that would be consumed by my business logic with a concrete implementation that knows which data context to use. But this seems to be a risky proposition as the number of entities increases, so does the number of individual repository|UoW|service types.
Then, take the case of my validation logic which works with multiple entities and, thereby, multiple data contexts. I'm not sure this is the most efficient way to do this.
The other requirement that I have yet to mention is on the reporting side where I will need to execute some complex queries on the data stores. As of right now, these queries will be limited to a single data store at a time, but the possibility is there that I might need to have the ability to mash data together from multiple sources.
Finally, I am considering the idea of pulling out all of the data access stuff for the first two (shared) databases into their own project and have been looking at WCF Data Services as a possible approach. This would give me the basis for a consistent approach for any application making use of this data.
How does this change your thinking?
In your case I would recommend returning IEnummerables's for your data queries for the repo. I usually aggregate calls from multiple repo's through a service class that represents the domain problem and encapsulates my business logic. To keep it clean I try keep my repros focused on the domain problem. I liken my Datacontext to a repo, and extract an interface using a T4 template to make life easier for mocking. But there is nothing stopping you using a traditional repo that encapsulates your calls. Doing it this way will allow you to switch ORM's at any stage.
EDIT: IQueryable IS NOT THE ANSWER! :-)
I have also done a lot of work in this area, and INITIALLY came to the same conclusion, however it is NOT a good solution. The point of the Repo is to abstract queries into discrete chunks of work. Exposing IQueryable is too adhoc and raises some issues later down the line. You loose your ability to scale. You loose your ability to optimize queries (Lets say I want to move to a highly optimized stored proc). You loose your ability to use IoC for the repo to switch out data access layers (switch the project from SQL to Mongo). You loose your ability to provide effective data caching in the Repo (Which is a major strength in the Repo pattern). I would recommend taking a CLOSE look as to WHY we have a Repo pattern. It isn't simply an "ORM" mapping layer. What made this really clear to me was the CQRS pattern.
Further to this allowing the ad-hoc nature of IQueryable opens you to misfitting reuse of queries. It is GENERALLY not a good idea to reuse queries, since query to query you see slight deviations, which ends up with 2 byproducts: Queries become too broad and inefficient. Queries become riddled with unmaintainable IF THEN statements to cater for the deviations.
IQueryable is easy, but opens you up to an unmaintainable mess.
Look at this SO answer. I think it shows a simplified model of what you want. IQueryable<T> is indeed our new Repository :-). DataContext and ObjectContext are our Unit of Work.
UPDATE 2:
Here is a blog post that describes the model you might be looking for.
UPDATE 3
It would be wise to hide the shared databases behind a service. This will solve several problems:
This will make the database private to the service, which makes it much easier to change the implementation when needed.
You can put the needed validation logic (for database 1) in that service and can create tests for that validation logic in that project.
Clients accessing that service can assume correctness of the service, and its validation logic.
The result of this is that your application will send data to the service to validate it. Call the service to fetch data. Query its own private database (database 3) and join the data of the three data source locally together. I've never been a fan of using cross-database or even cross-server (in your situation) database calls and letting the database join everything together. Transactions will be promoted to distributed-transactions and it's hard to predict how many data the servers will exchange.
When you abstract the shared databases behind the service, things get easier (at least from your application's point of view). Your application calls services it trusts which limits the amount of code in that application and the amount of tests. You still want to mock the calls to such a service, but that would be pretty easy. It should also solve the problem of validating over multiple data sources.
Validation is always a hard part. I'm very familiar with Validation Application block, and love it for it's flexibility. It isn't however an easy framework, but you might take a peek at what you can do with it. For instance, I've written several articles about integration with O/RM tools and how to 'embed' a context (context as in DataContext/Unit of Work) in Validation Application Block.
Please have a look at my IRepository pattern implementation using EF 4.0.
My solution has the following features:
supports connections to multiple dbs
One repository per entity
Support for execution of queries
Unit of work pattern implementation
Support for validating entities using VAB guidance
Common operations are kept at base class level. High use of OOPS techniques for code re-usability and ease of maintenance.

design an extendable database model

Currently I'm doing a project whose specifications are unclear - well who doesn't. I wonder what's the best development strategy to design a DB, that's going to be extended sooner or later with additional tables and relations. I want to include "changeability".
My main concern is that I want to apply design patterns (it's a university project) and I want to separate the constant factors from those, that change by choosing appropriate design patterns - in my case MVC and a set of sub-patterns at model level.
When it comes to the DB however, I may have to resdesign my model in my MVC approach, because my domain model at a later stage my require a different set of classes representing the DB tables. I use Hibernate as an abstraction layer between DB and application.
Would you start with a very minimal DB, just a few tables and relations? And what if I want an efficient DB, too? I wonder what strategies are applied in the real world. Stakeholder analysis for example isn't a sufficient planing solution when it comes to changing requirements. I think - at a DB level - my design pattern ends. So there's breach whose impact I'd like to minimize with a smart strategy.
In unclear situations I prefer a minimalistic DB design, supporting the needs known right now. My experience is that any effort to be clever, to model for future needs makes the model more complex. When the new needs arise, they are often in unforseen areas. The extra modeling for future needs doesn't fit the new needs, but rather makes the needed refactoring even harder.
As you already have chosen Hibernate to be able to decouple the DB design and the OO model, I think that sticking with an as simple DB as possible is a good choice.
What you describe is typical for almost every project. There are a few things you can do however.
Try to isolate the concepts (not their realizations) of your problem domain. Remember: Extending a data model is almost always easy (add a new table, a new column etc.) but changing your data model is hard and requires data migration.
I advocate using an Agile development process: Implement only what you need right now, but make sure you understand the complete problem before modeling it.
Another thing you should check before starting to hack away your code is wether your chosen infrastructure is appropriate. Using a relational database when you want to change your schema's very often is usually a bad match. Document databases are schema-less and hence more flexible. I think you should evaluate wether using a relational database is really appropriate for you application.
"Currently I'm doing a project whose specifications are unclear"
Given the 'database' tag, I assume you are asking this question in a database context.
Remember that a database is a set of ASSERTIONS OF FACT (capitalization intended).
If it is unclear what kind of "assertions of fact" your user wants to be registered by the database, then you simply cannot define (the structure of) your database.
And you will be helping both yourself and your user by first trying to clear out everything that is unclear.
In a simple answer: BE MINIMALISTIC.
Try to figure out the main entities. Don´t worry about the properties, you will fill them later. Then, create the relations between the entities. Create a test application using wour favorite ORM (Hibernate?), build some unit tests, and voilà, you have your minimal DB operational. :)
No project begins with requirements entirely known and fixed for all time. Use an agile, iterative approach to the database design so you that you can accommodate change during development.
All database designs are extensible and subject to change during their lifetime. Don't try to avoid change. Just make sure you have the right people and processes in place to manage change effectively when it happens.

Generating UI from DB - the good, the bad and the ugly?

I've read a statement somewhere that generating UI automatically from DB layout (or business objects, or whatever other business layer) is a bad idea. I can also imagine a few good challenges that one would have to face in order to make something like this.
However I have not seen (nor could find) any examples of people attempting it. Thus I'm wondering - is it really that bad? It's definately not easy, but can it be done with any measure success? What are the major obstacles? It would be great to see some examples of successes and failures.
To clarify - with "generating UI automatically" I mean that the all forms with all their controls are generated completely automatically (at runtime or compile time), based perhaps on some hints in metadata on how the data should be represented. This is in contrast to designing forms by hand (as most people do).
Added: Found this somewhat related question
Added 2: OK, it seems that one way this can get pretty fair results is if enough presentation-related metadata is available. For this approach, how much would be "enough", and would it be any less work than designing the form manually? Does it also provide greater flexibility for future changes?
We had a project which would generate the database tables/stored proc as well as the UI from business classes. It was done in .NET and we used a lot of Custom Attributes on the classes and properties to make it behave how we wanted it to. It worked great though and if you manage to follow your design you can create customizations of your software really easily. We also did have a way of putting in "custom" user controls for some very exceptional cases.
All in all it worked out well for us. Unfortunately it is a sold banking product and there is no available source.
it's ok for something tiny where all you need is a utilitarian method to get the data in.
for anything resembling a real application though, it's a terrible idea. what makes for a good UI is the humanisation factor, the bits you tweak to ensure that this machine reacts well to a person's touch.
you just can't get that when your interface is generated mechanically.... well maybe with something approaching AI. :)
edit - to clarify: UI generated from code/db is fine as a starting point, it's just a rubbish end point.
hey this is not difficult to achieve at all and its not a bad idea at all. it all depends on your project needs. a lot of software products (mind you not projects but products) depend upon this model - so they dont have to rewrite their code / ui logic for different client needs. clients can customize their ui the way they want to using a designer form in the admin system
i have used xml for preserving meta data for this sort of stuff. some of the attributes which i saved for every field were:
friendlyname (label caption)
haspredefinedvalues (yes for drop
down list / multi check box list)
multiselect (if yes then check box
list, if no then drop down list)
datatype
maxlength
required
minvalue
maxvalue
regularexpression
enabled (to show or not to show)
sortkey (order on the web form)
regarding positioning - i did not care much and simply generate table tr td tags 1 below the other - however if you want to implement this as well, you can have 1 more attribute called CssClass where you can define ui specific properties (look and feel, positioning, etc) here
UPDATE: also note a lot of ecommerce products follow this kind of dynamic ui when you want to enter product information - as their clients can be selling everything under the sun from furniture to sex toys ;-) so instead of rewriting their code for every different industry they simply let their clients enter meta data for product attributes via an admin form :-)
i would also recommend you to look at Entity-attribute-value model - it has its own pros and cons but i feel it can be used quite well with your requirements.
In my Opinion there some things you should think about:
Does the customer need a function to customize his UI?
Are there a lot of different attributes or elements?
Is the effort of creating such an "rendering engine" worth it?
Okay, i think that its pretty obvious why you should think about these. It really depends on your project if that kind of model makes sense...
If you want to create some a lot of forms that can be customized at runtime then this model could be pretty uselful. Also, if you need to do a lot of smaller tools and you use this as some kind of "engine" then this effort could be worth it because you can save a lot of time.
With that kind of "rendering engine" you could automatically add error reportings, check the values or add other things that are always build up with the same pattern. But if you have too many of this things, elements or attributes then the performance can go down rapidly.
Another things that becomes interesting in bigger projects is, that changes that have to occur in each form just have to be made in the engine, not in each form. This could save A LOT of time if there is a bug in the finished application.
In our company we use a similar model for an interface generator between cash-software (right now i cant remember the right word for it...) and our application, just that it doesnt create an UI, but an output file for one of the applications.
We use XML to define the structure and how the values need to be converted and so on..
I would say that in most cases the data is not suitable for UI generation. That's why you almost always put a a layer of logic in between to interpret the DB information to the user. Another thing is that when you generate the UI from DB you will end up displaying the inner workings of the system, something that you normally don't want to do.
But it depends on where the DB came from. If it was created to exactly reflect what the users goals of the system is. If the users mental model of what the application should help them with is stored in the DB. Then it might just work. But then you have to start at the users end. If not I suggest you don't go that way.
Can you look on your problem from application architecture perspective? I see you as another database terrorist – trying to solve all by writing stored procedures. Why having UI at all? Try do it in DB script. In effect of such approach – on what composite system you will end up? When system serves different businesses – try modularization, selectively discovered components, restrict sharing references. UI shall be replaceable, independent from business layer. When storing so much data in DB – there is hard dependency of UI – system becomes monolith. How you implement MVVM pattern in scenario when UI is generated? Designers like Blend are containing lots of features, which cannot be replaced by most futuristic UI generator – unless – your development platform is Notepad only.
There is a hybrid approach where forms and all are described in a database to ensure consistency server side, which is then compiled to ensure efficiency client side on deploy.
A real-life example is the enterprise software MS Dynamics AX.
It has a 'Data' database and a 'Model' database.
The 'Model' stores forms, classes, jobs and every artefact the application needs to run.
Deploying the new software structure used to be to dump the model database and initiate a CIL compile (CIL for common intermediate language, something used by Microsoft in .net)
This way is suitable for enterprise-wide software and can handle large customizations. But keep in mind that this approach sets a framework that should be well understood by whoever gonna maintain and customize the application later.
I did this (in PHP / MySQL) to automatically generate sections of a CMS that I was building for a client. It worked OK my main problem was that the code that generates the forms became very opaque and difficult to understand therefore difficult to reuse and modify so I did not reuse it.
Note that the tables followed strict conventions such as naming, etc. which made it possible for the UI to expect particular columns and infer information about the naming of the columns and tables. There is a need for meta information to help the UI display the data.
Generally it can work however the thing is if your UI just mirrors the database then maybe there is lots of room to improve. A good UI should do much more than mirror a database, it should be built around human interaction patterns and preferences, not around the database structure.
So basically if you want to be cheap and do a quick-and-dirty interface which mirrors your DB then go for it. The main challenge would be to find good quality code that can do this or write it yourself.
From my perspective, it was always a problem to change edit forms when a very simple change was needed in a table structure.
I always had the feeling we have to spend too much time on rewriting the CRUD forms instead of developing the useful stuff, like processing / reporting / analyzing data, giving alerts for decisions etc...
For this reason, I made long time ago a code generator. So, it become easier to re-generate the forms with a simple restriction: to keep the CSS classes names. Simply like this!
UI was always based on a very "standard" code, controlled by a custom CSS.
Whenever I needed to change database structure, so update an edit form, I had to re-generate the code and redeploy.
One disadvantage I noticed was about the changes (customizations, improvements etc.) done on the previous generated code, which are lost when you re-generate it.
But anyway, the advantage of having a lot of work done by the code-generator was great!
I initially did it for the 2000s Microsoft ASP (Active Server Pages) & Microsoft SQL Server... so, when that technology was replaced by .NET, my code-generator become obsoleted.
I made something similar for PHP but I never finished it...
Anyway, from small experiments I found that generating code ON THE FLY can be way more helpful (and this approach does not exclude the SAVED generated code): no worries about changing database etc.
So, the next step was to create something that I am very proud to show here, and I think it is one nice resolution for the issue raised in this thread.
I would start with applicable use cases: https://data-seed.tech/usecases.php.
I worked to add details on how to use, but if something is still missing please let me know here!
You can change database structure, and with no line of code you can start edit data, and more like this, you have available an API for CRUD operations.
I am still a fan of the "code-generator" approach, and I think it is just a flavor of using XML/XSLT that I used for DATA-SEED. I plan to add code-generator functionalities.

Resources