Using database constraints in web development or just in application logics - webforms

I would like to ask about using database constraints in web development. When I designe a web form with some restrictions, they must be defined in the web form. Should I define them in database too? Or it is just a duplicity?
If I use database constraints I can guarantee data integrity for 100 %. But database constraints makes more difficult web development and debugging errors.
Just question that we discussed between web developer and database administrator.

Ok, a bit of a open ended question.
But, yes, you should define most of the basic constraints in the database.
However, those constraints in most cases don't amount to much. So, say allow nulls, or enforcing referential integrity is a VERY good idea.
However, for somthing like say a discount entered can't be more then say 20% or some such, or things like some value must be > 0? Then that kind of stuff is better left to your code + UI.
So, for basic things like default values (bit field say 0 for false as a default), things like allow a column to be null, and for SURE enforcing RI is a good thing.
but, for those basic values like must be > 0 or must be < 20 or whatever? that in most cases does not make sense to put in the database, and one big reason?
If the database triggers that error, it 99% of the time too late for the user to do anything about such cases, and then you wind up having to write that code to do the same thing in the UI and give the user a friendly message and workflow anyway.
So in general, yes, build such rules in the UI + code. While in some cases such as attempting to add a child record (say adding a invoice without first having added a customer record ?). In that case, you have both the database RI and also have to write the code in the UI. But for those smaller types of "rules" such as > 0, or whatever? That you don't need to put in the database, and as you note, in most cases it will just get in your way.

Related

How to validate from aggregate

I am trying to understand validation from the aggregate entity on the command side of the CQRS pattern when using eventsourcing.
Basically I would like to know what the best practice is in handling validation for:
1. Uniqueness of say a code.
2. The correctness/validation of an eternal aggregate's id.
My initial thoughts:
I've thought about the constructor passing in a service but this seems wrong as the "Create" of the entity should be the values to assign.
I've thought about validation outside the aggregate, but this seem to put logic somewhere that I assume should be responsibility of the aggregate itself.
Can anyone give me some guidance here?
Uniqueness of say a code.
Ensuring uniqueness is a specific example of set validation. The problem with set validation is that, in effect, you perform the check by locking the entire set. If the entire set is included within a single "aggregate", that's easily done. But if the set spans aggregates, then it is kind of a mess.
A common solution for uniqueness is to manage it at the database level; RDBMS are really good at set operations, and are effectively serialized. Unfortunately, that locks you into a database solution with good set support -- you can't easily switch to a document database, or an event store.
Another approach that is sometimes appropriate is to have the single aggregate check for uniqueness against a cached copy of the available codes. That gives you more freedom to choose your storage solution, but it also opens up the possibility that a data race will introduce the duplication you are trying to avoid.
In some cases, you can encode the code uniqueness into the identifier for the aggregate. In effect, every identifier becomes a set of one.
Keep in mind Greg Young's question
What is the business impact of having a failure?
Knowing how expensive a failure is tells you a lot about how much you are permitted to spend to solve the problem.
The correctness/validation of an eternal aggregate's id.
This normally comes in two parts. The easier one is to validate the data against some agreed upon schema. If our agreement is that the identifier is going to be a URI, then I can validate that the data I receive does satisfy that constraint. Similarly, if the identifier is supposed to be a string representation of a UUID, I can test that the data I receive matches the validation rules described in RFC 4122.
But if you need to check that the identifier is in use somewhere else? Then you are going to have to ask.... The main question in this case is whether you need the answer to that right away, or if you can manage to check that asynchronously (for instance, by modeling "unverified identifiers" and "verified identifiers" separately).
And of course you once again get to reconcile all of the races inherent in distributed computing.
There is no magic.

Should I protect my database from invalid data?

I always tend to "protect" my persistance layer from violations via the service layer. However, I am beginning to wonder if it's really necessary. What's the point in taking the time in making my database robust, building relationships & data integrity when it never actually comes into play.
For example, consider a User table with a unique contraint on the Email field. I would naturally want to write blocker code in my service layer to ensure the email being added isn't already in the database before attempting to add anything. In the past I have never really seen anything wrong with it, however, as I have been exposed to more & more best practises/design principles I feel that this approach isn't very DRY.
So, is it correct to always ensure data going to the persistance layer is indeed "valid" or is it more natural to let the invalid data get to the database and handle the error?
Please don't do that.
Implementing even "simple" constraints such as keys is decidedly non-trivial in a concurrent environment. For example, it is not enough to query the database in one step and allow the insertion in another only if the first step returned empty result - what if a concurrent transaction inserted the same value you are trying to insert (and committed) in between your steps one and two? You have a race condition that could lead to duplicated data. Probably the simplest solution for this is to have a global lock to serialize transactions, but then scalability goes out of the window...
Similar considerations exist for other combinations of INSERT / UPDATE / DELETE operations on keys, as well as other kinds of constraints such as foreign keys and even CHECKs in some cases.
DBMSes have devised very clever ways over the decades to be both correct and performant in situations like these, yet allow you to easily define constraints in declarative manner, minimizing the chance for mistakes. And all the applications accessing the same database will automatically benefit from these centralized constraints.
If you absolutely must choose which layer of code shouldn't validate the data, the database should be your last choice.
So, is it correct to always ensure data going to the persistance layer is indeed "valid" (service layer) or is it more natural to let the invalid data get to the database and handle the error?
Never assume correct data and always validate at the database level, as much as you can.
Whether to also validate in upper layers of code depends on a situation, but in the case of key violations, I'd let the database do the heavy lifting.
Even though there isn't a conclusive answer, I think it's a great question.
First, I am a big proponent of including at least basic validation in the database and letting the database do what it is good at. At minimum, this means foreign keys, NOT NULL where appropriate, strongly typed fields wherever possible (e.g. don't put a text field where an integer belongs), unique constraints, etc. Letting the database handle concurrency is also paramount (as #Branko Dimitrijevic pointed out) and transaction atomicity should be owned by the database.
If this is moderately redundant, than so be it. Better too much validation than too little.
However, I am of the opinion that the business tier should be aware of the validation it is performing even if the logic lives in the database.
It may be easier to distinguish between exceptions and validation errors. In most languages, a failed data operation will probably manifest as some sort of exception. Most people (me included) are of the opinion that it is bad to use exceptions for regular program flow, and I would argue that email validation failure (for example) is not an "exceptional" case.
Taking it to a more ridiculous level, imagine hitting the database just to determine if a user had filled out all required fields on a form.
In other words, I'd rather call a method IsEmailValid() and receive a boolean than try to have to determine if the database error which was thrown meant that the email was already in use by someone else.
This approach may also perform better, and avoid annoyances like skipped IDs because an INSERT failed (speaking from a SQL Server perspective).
The logic for validating the email might very well live in a reusable stored procedure if it is more complicated than simply a unique constraint.
And ultimately, that simple unique constraint provides final protection in case the business tier makes a mistake.
Some validation simply doesn't need to make a database call to succeed, even though the database could easily handle it.
Some validation is more complicated than can be expressed using database constructs/functions alone.
Business rules across applications may differ even against the same (completely valid) data.
Some validation is so critical or expensive that it should happen prior to data access.
Some simple constraints like field type/length can be automated (anything run through an ORM probably has some level of automation available).
Two reasons to do it. The db may be accessed from another application..
You might make a wee error in your code, and put data in the db, which because your service layer operates on the assumption that this could never happen, makes it fall over if you are lucky, silent data corruption being worst case.
I've always looked at rules in the DB as backstop for that exceptionally rare occasion when I make a mistake in the code. :)
The thing to remember, is if you need to , you can always relax a constraint, tightening them after your users have spent a lot of effort entering data will be far more problematic.
Be real wary of that word never, in IT, it means much sooner than you wished.

In MS CRM 4.0, is there a way to ensure that a field is unique?

I am creating a new entity in CRM 4.0 to track our projects. One field is a Project Code, and I'd like to have a way to ensure that this field contains a unique value.
I understand that this is not a key, and it won't be used as a key, but for human readability/tracking purposes, it would be nice if I could tell the user that the code he just entered has already been used.
I am thinking that a webservice/javascript call will be necessary, but I wanted to see if anyone else has tackled this issue already.
Depends on how foolproof you want it to be.
The web service call is pretty lightweight, but if two people save a record at the same time, it's not going to detect it at that time, and dupe codes will happen.
A custom plugin would definitely detect dupe codes, but you don't get any feedback until after the user attempts to save. There's also still a small chance there could be repeat codes from users entering records.
The completely bulletproof way we've used is to have a plugin that checks a custom database table that we lock and then only let one plugin at a time through.

design an extendable database model

Currently I'm doing a project whose specifications are unclear - well who doesn't. I wonder what's the best development strategy to design a DB, that's going to be extended sooner or later with additional tables and relations. I want to include "changeability".
My main concern is that I want to apply design patterns (it's a university project) and I want to separate the constant factors from those, that change by choosing appropriate design patterns - in my case MVC and a set of sub-patterns at model level.
When it comes to the DB however, I may have to resdesign my model in my MVC approach, because my domain model at a later stage my require a different set of classes representing the DB tables. I use Hibernate as an abstraction layer between DB and application.
Would you start with a very minimal DB, just a few tables and relations? And what if I want an efficient DB, too? I wonder what strategies are applied in the real world. Stakeholder analysis for example isn't a sufficient planing solution when it comes to changing requirements. I think - at a DB level - my design pattern ends. So there's breach whose impact I'd like to minimize with a smart strategy.
In unclear situations I prefer a minimalistic DB design, supporting the needs known right now. My experience is that any effort to be clever, to model for future needs makes the model more complex. When the new needs arise, they are often in unforseen areas. The extra modeling for future needs doesn't fit the new needs, but rather makes the needed refactoring even harder.
As you already have chosen Hibernate to be able to decouple the DB design and the OO model, I think that sticking with an as simple DB as possible is a good choice.
What you describe is typical for almost every project. There are a few things you can do however.
Try to isolate the concepts (not their realizations) of your problem domain. Remember: Extending a data model is almost always easy (add a new table, a new column etc.) but changing your data model is hard and requires data migration.
I advocate using an Agile development process: Implement only what you need right now, but make sure you understand the complete problem before modeling it.
Another thing you should check before starting to hack away your code is wether your chosen infrastructure is appropriate. Using a relational database when you want to change your schema's very often is usually a bad match. Document databases are schema-less and hence more flexible. I think you should evaluate wether using a relational database is really appropriate for you application.
"Currently I'm doing a project whose specifications are unclear"
Given the 'database' tag, I assume you are asking this question in a database context.
Remember that a database is a set of ASSERTIONS OF FACT (capitalization intended).
If it is unclear what kind of "assertions of fact" your user wants to be registered by the database, then you simply cannot define (the structure of) your database.
And you will be helping both yourself and your user by first trying to clear out everything that is unclear.
In a simple answer: BE MINIMALISTIC.
Try to figure out the main entities. Don´t worry about the properties, you will fill them later. Then, create the relations between the entities. Create a test application using wour favorite ORM (Hibernate?), build some unit tests, and voilà, you have your minimal DB operational. :)
No project begins with requirements entirely known and fixed for all time. Use an agile, iterative approach to the database design so you that you can accommodate change during development.
All database designs are extensible and subject to change during their lifetime. Don't try to avoid change. Just make sure you have the right people and processes in place to manage change effectively when it happens.

Generating UI from DB - the good, the bad and the ugly?

I've read a statement somewhere that generating UI automatically from DB layout (or business objects, or whatever other business layer) is a bad idea. I can also imagine a few good challenges that one would have to face in order to make something like this.
However I have not seen (nor could find) any examples of people attempting it. Thus I'm wondering - is it really that bad? It's definately not easy, but can it be done with any measure success? What are the major obstacles? It would be great to see some examples of successes and failures.
To clarify - with "generating UI automatically" I mean that the all forms with all their controls are generated completely automatically (at runtime or compile time), based perhaps on some hints in metadata on how the data should be represented. This is in contrast to designing forms by hand (as most people do).
Added: Found this somewhat related question
Added 2: OK, it seems that one way this can get pretty fair results is if enough presentation-related metadata is available. For this approach, how much would be "enough", and would it be any less work than designing the form manually? Does it also provide greater flexibility for future changes?
We had a project which would generate the database tables/stored proc as well as the UI from business classes. It was done in .NET and we used a lot of Custom Attributes on the classes and properties to make it behave how we wanted it to. It worked great though and if you manage to follow your design you can create customizations of your software really easily. We also did have a way of putting in "custom" user controls for some very exceptional cases.
All in all it worked out well for us. Unfortunately it is a sold banking product and there is no available source.
it's ok for something tiny where all you need is a utilitarian method to get the data in.
for anything resembling a real application though, it's a terrible idea. what makes for a good UI is the humanisation factor, the bits you tweak to ensure that this machine reacts well to a person's touch.
you just can't get that when your interface is generated mechanically.... well maybe with something approaching AI. :)
edit - to clarify: UI generated from code/db is fine as a starting point, it's just a rubbish end point.
hey this is not difficult to achieve at all and its not a bad idea at all. it all depends on your project needs. a lot of software products (mind you not projects but products) depend upon this model - so they dont have to rewrite their code / ui logic for different client needs. clients can customize their ui the way they want to using a designer form in the admin system
i have used xml for preserving meta data for this sort of stuff. some of the attributes which i saved for every field were:
friendlyname (label caption)
haspredefinedvalues (yes for drop
down list / multi check box list)
multiselect (if yes then check box
list, if no then drop down list)
datatype
maxlength
required
minvalue
maxvalue
regularexpression
enabled (to show or not to show)
sortkey (order on the web form)
regarding positioning - i did not care much and simply generate table tr td tags 1 below the other - however if you want to implement this as well, you can have 1 more attribute called CssClass where you can define ui specific properties (look and feel, positioning, etc) here
UPDATE: also note a lot of ecommerce products follow this kind of dynamic ui when you want to enter product information - as their clients can be selling everything under the sun from furniture to sex toys ;-) so instead of rewriting their code for every different industry they simply let their clients enter meta data for product attributes via an admin form :-)
i would also recommend you to look at Entity-attribute-value model - it has its own pros and cons but i feel it can be used quite well with your requirements.
In my Opinion there some things you should think about:
Does the customer need a function to customize his UI?
Are there a lot of different attributes or elements?
Is the effort of creating such an "rendering engine" worth it?
Okay, i think that its pretty obvious why you should think about these. It really depends on your project if that kind of model makes sense...
If you want to create some a lot of forms that can be customized at runtime then this model could be pretty uselful. Also, if you need to do a lot of smaller tools and you use this as some kind of "engine" then this effort could be worth it because you can save a lot of time.
With that kind of "rendering engine" you could automatically add error reportings, check the values or add other things that are always build up with the same pattern. But if you have too many of this things, elements or attributes then the performance can go down rapidly.
Another things that becomes interesting in bigger projects is, that changes that have to occur in each form just have to be made in the engine, not in each form. This could save A LOT of time if there is a bug in the finished application.
In our company we use a similar model for an interface generator between cash-software (right now i cant remember the right word for it...) and our application, just that it doesnt create an UI, but an output file for one of the applications.
We use XML to define the structure and how the values need to be converted and so on..
I would say that in most cases the data is not suitable for UI generation. That's why you almost always put a a layer of logic in between to interpret the DB information to the user. Another thing is that when you generate the UI from DB you will end up displaying the inner workings of the system, something that you normally don't want to do.
But it depends on where the DB came from. If it was created to exactly reflect what the users goals of the system is. If the users mental model of what the application should help them with is stored in the DB. Then it might just work. But then you have to start at the users end. If not I suggest you don't go that way.
Can you look on your problem from application architecture perspective? I see you as another database terrorist – trying to solve all by writing stored procedures. Why having UI at all? Try do it in DB script. In effect of such approach – on what composite system you will end up? When system serves different businesses – try modularization, selectively discovered components, restrict sharing references. UI shall be replaceable, independent from business layer. When storing so much data in DB – there is hard dependency of UI – system becomes monolith. How you implement MVVM pattern in scenario when UI is generated? Designers like Blend are containing lots of features, which cannot be replaced by most futuristic UI generator – unless – your development platform is Notepad only.
There is a hybrid approach where forms and all are described in a database to ensure consistency server side, which is then compiled to ensure efficiency client side on deploy.
A real-life example is the enterprise software MS Dynamics AX.
It has a 'Data' database and a 'Model' database.
The 'Model' stores forms, classes, jobs and every artefact the application needs to run.
Deploying the new software structure used to be to dump the model database and initiate a CIL compile (CIL for common intermediate language, something used by Microsoft in .net)
This way is suitable for enterprise-wide software and can handle large customizations. But keep in mind that this approach sets a framework that should be well understood by whoever gonna maintain and customize the application later.
I did this (in PHP / MySQL) to automatically generate sections of a CMS that I was building for a client. It worked OK my main problem was that the code that generates the forms became very opaque and difficult to understand therefore difficult to reuse and modify so I did not reuse it.
Note that the tables followed strict conventions such as naming, etc. which made it possible for the UI to expect particular columns and infer information about the naming of the columns and tables. There is a need for meta information to help the UI display the data.
Generally it can work however the thing is if your UI just mirrors the database then maybe there is lots of room to improve. A good UI should do much more than mirror a database, it should be built around human interaction patterns and preferences, not around the database structure.
So basically if you want to be cheap and do a quick-and-dirty interface which mirrors your DB then go for it. The main challenge would be to find good quality code that can do this or write it yourself.
From my perspective, it was always a problem to change edit forms when a very simple change was needed in a table structure.
I always had the feeling we have to spend too much time on rewriting the CRUD forms instead of developing the useful stuff, like processing / reporting / analyzing data, giving alerts for decisions etc...
For this reason, I made long time ago a code generator. So, it become easier to re-generate the forms with a simple restriction: to keep the CSS classes names. Simply like this!
UI was always based on a very "standard" code, controlled by a custom CSS.
Whenever I needed to change database structure, so update an edit form, I had to re-generate the code and redeploy.
One disadvantage I noticed was about the changes (customizations, improvements etc.) done on the previous generated code, which are lost when you re-generate it.
But anyway, the advantage of having a lot of work done by the code-generator was great!
I initially did it for the 2000s Microsoft ASP (Active Server Pages) & Microsoft SQL Server... so, when that technology was replaced by .NET, my code-generator become obsoleted.
I made something similar for PHP but I never finished it...
Anyway, from small experiments I found that generating code ON THE FLY can be way more helpful (and this approach does not exclude the SAVED generated code): no worries about changing database etc.
So, the next step was to create something that I am very proud to show here, and I think it is one nice resolution for the issue raised in this thread.
I would start with applicable use cases: https://data-seed.tech/usecases.php.
I worked to add details on how to use, but if something is still missing please let me know here!
You can change database structure, and with no line of code you can start edit data, and more like this, you have available an API for CRUD operations.
I am still a fan of the "code-generator" approach, and I think it is just a flavor of using XML/XSLT that I used for DATA-SEED. I plan to add code-generator functionalities.

Resources