A Spring DAO that can adapt to changes in the data

A Spring DAO that can adapt to changes in the data - spring

For application developers, I suppose the traditional paradigm for writing an application with domain objects that can be persisted to an underlying data store (SQL database for arguments sake), is to write the domain objects and then write (or generate) the table structure. There is a tight coupling between what the domain object looks like and what the structure of underlying data store looks like. So if you want to add a piece of information to your domain object, you add the field to your code and then add a column to the appropriate database table. All familiar?
This is all well and good for data stores that have a well defined structure (I'm mainly talking about SQL databases whereby the tables and columns are pre-defined and fixed), but now a number of alternatives to the ubiquitous SQL database exist and these often do not constrain the data in this way. For instance, MongoDB is a NoSQL database whereby you divide data into collections but aside from that there is no structuring of the data. You don't define new columns when you want to add a new field.
Now to the question: given the flexibility of a data store like MongoDB, how would one go about achieving a similar kind of flexibility in the domain objects that represent this data? So for instance if I'm using Spring and creating my own domain obejcts, when I add a "middleName" field to my data, how can I avoid having to add a "middleName" field to my domain object? I'm looking for some kind of mechanism/approach/framework to dynamically inspect the data and have access to it in my domain object without having to make a code change every time. All ideas welcome.

I think you have a couple of choices:
You can use a dynamic programming language and not have domain objects (clojure for example)
If you're fixed on using java, the mongo java driver returns data in DBObject which is essentially a Map. So the default behavior already provides what you want. It's only when you map the DBObject into domain objects, using a library like morphia (or spring-data), that you even have to worry about domain objects at all.
But, if I was using java, I would stick with the standard convention of domain objects mapped via morphia, because I think adding a field is a very minor inconvenience when compared against the benefits.

I think the question is inherintly paradoxical-
On one hand, you want to have domain objects, i.e. objects that represent the data (and behaviour) of your problem domain.
On the other hand, you say that you don't want your domain objects to be explicitly influenced by changes to the data.
But when you have objects that represent your problem domain, you want to do just that- to represent your problem domain.
So that if, for example, middle name is added, then your representation of the real-life 'User' entity should change to accomodate this change to the real-life user; perhaps not only by adding this piece of data to your object, but also adding some related behaviour (validation of middle name, or some functionality related to it).
In essense, what I'm trying to say here is that when you have (classic OO) domain objects, you may need to change your behaviour / functionality along with your data, and since you don't have any automatic way of changing your behaviour, the question of automatically changing your data becomes irrelevant.
If you don't want behaviour associated with your data, then you essentialy have DTOs, and #Kevin's answer is what you're looking for.

Honestly, it sounds more like you're looking for some kind of blackbox DTO where, like you describe, fields are added or removed "arbitrarily" depending on the data. This makes me inclined to suggest a simple Map to do the job. You can't really have a domain-driven design if your domain model is constantly changing.

Related

Defining a schema that will be compatible between multiple databases and different conventions

I want to define one schema that will be cross teams & platform valid. This is pretty simple and can be thought of as a kind of ontology. What I need is to have the ability to define what the field represents and under it the name of the field on each platform. I'd like the schema to have the ability to generate data objects for each of the used languages, and therefore I'd like to know if my need can be filled within Protobuf or GraphQL. Notice - my conventions can be different than the trivial in my generated target language since it needs to be compatible with the databases. A simple example for my need:
{
"lastName": {
"mssqlName":"LastName",
"oracleName":"FamilyName",
"elasticName":"lastName",
"cassandraName":"last_name",
"rocksDbName":"surname",
},
"age" : {
...
}
As you can see, on some platforms I have totally different names than the others. I'd like to know what are the usual ways\ technologies to solve this problem, and if whether it will be possible with codegen-able technologies like Proto & GraphQL.

A single schema as the single point of truth for all object / message definition across databases, comms links, multiple languages and plaforms? It would be nice, wouldn't it?
The closest I can think of is XSD (XML schema), but I don't think it works when it comes to tools. For example, I know of tools that will take an XSD schema and generate you code that will serialise / deserialise objects to / from XML (e.g. Microsoft's xsd.exe). There's even some good ones.
And then there's tools that will create SQL tables from that XSD schema. But a code generator that builds classes that can access those tables isn't also building them to serialise / deserialise objects to and XML wireformat.
Basically, I've not come across a schema language that has tooling that does everything. The ASN.1 tools are very good at creating serialisation classes, but I've never found one that also targets SQL interactions. Same with XSD.
My knowledge is of course not exhaustive, and there might be something in JSON-land that works.
Minimum Pain Compromise Approach
What I have settled on in the past is to accept that I'm having to do some manual coding around changes in schema, but probably not too much. I'd define messages fully in, say, Google Protocol Buffers, and use that for object exchange between applications / languages. Where I wanted to stash objects in a database, I'd accept that for that I'd be having to have a parallel definition of the object in the table columns, but only for critical fields that I'd want to search on. The last column would be an arbitrary container, able to store the serialised object whole.
For example, if a GPB message had an integer ID field, and a string Name field, plus a bunch of other fields. My data base table would then have an ID column, a Name column, and column for storing Bytes.
That way I could serialise an object, and push it into a row's Bytes column whilst also filling in the ID and Name columns. I could quickly search for objects, because of the Name / ID column. If I then wanted access to the other fields in the object stored in the database, I'd have to retreive the record from the database and deserialise the Bytes column.
This way one is essentially taking a bet that those key columns / field names (ID, Name) won't ever be changed during development in the schema. But it's quite likely a safe bet. Generally, one can settle things like that quite easily, early on in a project, it's the rest of the schema that might be changed during development.
One small payoff is that if the reason to hunt out an object in the database is to be able to send it through a communications channel, it is already serialised in the database. No need to serialise it again before dispatch down the comms link.
So this approach can leave one with some duplication of code / points of truth, but can be quite performant in avoiding a serialisation step during parts of runtime.
You can also cheat a little. If the serialisation wireformat is text based (JSON, XML, some ASN.1 formats, etc), then there's a good chance that string searches on the bytes column will yield good results anyway. For instance, suppose a message field was MiddleName, but I'd not created that as a distinct table column in the database. I could find likely records for any given MiddleName by searching for the value in the Bytes column, as it's stored as text somewhere in there.
Reflection Based Approach?
A potential other approach is to accept that the tooling does not exist to satisfy all needs, and adapt using language features (reflection) to exploit a common feature of code generators.
For example, consider GPB's proto compiler. In the generated code you end up with classes whose members are named after the fields in messages. And it'll be more or less the same with any code generated to access a database table that has columns by the same name.
So it is possible to use reflection to make an auto-transcriber between generated classes. You iterate down the tree of members in one class, and you can match that up to a member in a different generated class.
This avoids the need for code like:
Protobuf::MyClass myObj_g; // An object built using GPB
JSON::MyClass myObj_j; // equivalent object to be copied from myObj_g;
myObj_j.Field1 = myObj_g.Field1;
myObj_j.Field2 = myObj_g.Field2;
.
.
.
Instead:
Protobuf::MyClass myObj_g; // An object built using GPB
JSON::MyClass myObj_j; // equivalent object to be copied from myObj_g;
foreach (Protobuf::MyClass::Reflection::Field field in Protobuf::MyClass.Fields)
{
myObj_j.Reflection.FindByName(field.Name) = myObj_g.Reflection.FindByName(field.Name);
}
There'd be a fit of fiddling around to do to get this to work between each database and serialisation technology, per language, but the point is you'd only ever have to write it once. Any subsequent schema changes do not require code changes, at least not so far as exchanging objects between a serialisation technology and a database access technology.
Obviously, reflection is easier / possible in some languages and not otheres.
The Fix It At Runtime Approach?
Apache Avro has the characteristic where serialised data describes it's own shape. Basically, wireformat data comes with its own schema, so a consumer can build a representation of the data automatically. In some languages that's horrid (C, C++), but libraries exist.
Basically, it forces you to write applications so that they work out what to do with data for themselves;

Should there be a abstraction layer between database and model?

What I see a lot is that people use a Object Relational Mapper (ORM) for doing SQL stuff when working in a MVC environment. But if i really have complex queries I would like to write this whole query myself. What is the best practice for this kind of situation?
Having a Abstraction Layer between your model and the database with the complex queries
Still using the model with creating specific methodes that handle the queries
Or is there any other way that might be better? please tell me :)

Consider the Single Responsibility Principle. Specifically, the question would be...
"If I put data access logic in my model, what will that mean when I need to change something?"
Any time you need to change business logic, you're also changing the objects which maintain data access logic. So the data access logic also needs to be re-tested. Conversely, any time you need to change data access logic, you're also changing the objects which maintain business logic. So the business logic also needs to be re-tested.
As the logic expands, this becomes more difficult very quickly.
The idea behind the Single Responsibility Principle is to separate the dependencies of different roles which can enact changes to the application. (Keep in mind that "roles" doesn't map 1-to-1 with "people." One person may have multiple roles, but it's still important to separate those roles.) It's a matter of simpler support. If you want to make a change to a database query (say, for performance reasons) which shouldn't have any visible affect on anything else in the system, then there's no reason to be changing objects which contain business logic.

1. Having a Abstraction Layer between your model and the database with the complex queries
Yes, you should have a persistence abstraction that sits between storage (database or any other data source) and you business logic. Your business logic should not depend on "where", "how" and even "if" the data is actually stored.
Basically, your code should (at least - try to) adhere to SOLID principles, but as #david already pointed out: you are already violating the first on on that list.
Also, you should consider using a service layer which would be responsible for dealing with interaction between implementation of domain model and your persistence abstraction (doesn't matter whether you are using custom written data mappers or some 3rd party ORM).
In the article (more like excerpt, actually) the "MVC model" is actually all three concentric circles together. Domain model is not code. It actually is trm that describs the accumulated knowledge about the project. Most of domain model gets turned into pieces of code. Those pieces are referred to as domain objects.
2. Still using the model with creating specific methodes that handle the queries
This would imply implementation of active record. It is useful, but mostly misused pattern, for cases when your objects have no (or almost none) business logic. Basically - you should use active record only if all you need are glorified setter an getters, that talk to database.
Active record pattern is a very good choice when you need to quickly prototype something, but it should not be used, when you are attempting to implement fully realized model layer.

ORM's in general do not specifically have any drawbacks versus using direct SQL to fetch data from the database. ORM's as the name implies help in keeping your Relational model (designed using your SQL DDL's or using JPA annotations) and OO model in sync and help them integrate well together.
When using a ORM, you can write your queries in JPQL which is Object oriented SQL. So instead of writing queries that manipulate tables, you are writing queries that manipulate objects. You use the relationships between these objects to get your desired result. Now I understand that sometimes its easier to just write Native SQL, so the JPA specification allows you to run native sql! This just returns you list of "Generic Objects" which you can organize any way you like. When you choose to go this route and actually pick a JPA provider, like Hibernate, these providers have extended functionalities. So if you do have complex relationships you can use libraries like Hibernate Criteria Builder to help you create queries for those complex relationships.
So, if building a large MVC application, it would generally be a good idea to have this abstraction layer in the middle - handling all these relationships. It makes it easier on you the developer to just look at the big picture and the business side of the application.

Imho, no. I think, even the ORM layer adds often more complexity as needed. The databases have very good and sophisticated mechanisms for high-level data manipulation. Triggers, views, constraints, complex keying-indexing, (sub)transactions, stored procedures, and procedural extensions of the query language were normally much more as enough for everything.
The ORMs can't give, because of their structural barriers, a real interface to this feature set.
And the common practice is that the applications use practically only a nosql record service from all of this, and implement in an unneeded "middleware" which were the mission of the database.
Which I see really interesting, if the feature set of the databases got some OO-like interface (see "sql abstract types"), and the client-side logic went in the application (see "REST"). This practically eliminated the need of the middle layer.

Should i create the model classes based on the structure of data in Database?

I have predefined tables in the database based on which I have to develop a web application.
Should I base my model classes on the structure of data in the tables.
But a problem is that the tables are very poorly defined and there is much redundant data in them (which I can not change!).
Eg. in 2 tables three columns are same.
Table: Student_details
Student_id , Name, AGe, Class ,School
Table :Student_address
Student_id,Name,Age, Street1,Street2,City

I think you should make your models in a way that would be best suited for how they will be used. Don't worry about how the data is stored or where it is stored... otherwise why go through the trouble of layering your code. Why not just do the direct DB query right in your view? So if you are going to create an abstraction of your data... "model" ... make one that is designed around how it will be used... not how it will be or is persisted.

This seems like a risky project - presumably, there's another application somewhere which populates these tables. As the data model is not very sound from a relational point of view, I'm guessing there's a bunch of business/data logic glued into that app - for instance, putting the student age into the StudentAddress table.
I'd support jsobo in recommending you build your business logic independently of the underlying persistance mechanism, and that you try to keep your models as domain focused as possible, without too much emphasis on how the database happens to be structured.
You should, however, plan on spending a certain amount of time translating your domain models into their respective data representations and dealing with whatever quirks the data model imposes. I'd strongly recommend containing all this stuff in a separate translation layer - don't litter it throughout the rest of the application.

UI-centric vs domain-centric data model - pros and cons

How closely does your data model map to your UI and domain model?
The data model can be quite close to the domain model if it has, for example, a Customer table, an Employee table etc.
The UI might not reflect the data model so closely though - for example, there may be multiple forms, all feeding in bits-and-pieces of Customer data along with other miscellaneous bits of data. In this case, one could you have separate tables to hold the data from each form. As required the data can then combined at a future point... Alternatively one could insert the form data directly into a Customer table, so that the data model does not correlate well to the UI.
What has proven to work better for you?

I find it cleaner to map your domain model to the real world problem you are trying to solve.
You can then create viewmodels which act as a bucket of all the data required by your view.
as stated, your UI can change frequently, but this does not usually change the particular domain problem you are tackling...
information can be found on this pattern here:
http://blogs.msdn.com/dphill/archive/2009/01/31/the-viewmodel-pattern.aspx

UI can change according to many needs, so it's generally better to keep data in a domain model, abstracted away from any one UI.

If I have a RESTful service layer, what they are exposing the domain model. In that case , the UI(any particular screen) calls a number of these services and from the domain models collected composes the screen. In this scenario although domain models bubble all the way up to UI the UI layer skims out the necessary data to build its particular screen. There are also some interesting questions on SO about on using domain model(annotated) for persistence.
My point here is the domain models can be a single source of truth. It can do the work of carrying data , encapsulating logic fairly well. I have worked on projects which had a lot of boilerplate code translating each domain model to DTO, VO , DO and what-have-yous. A lot of that looked quite unnecessary and more due to habit in most cases.

Recommended data structure for a Data Access layer

I am building a DataAccess layer to a DB, what data structure is recommended to use to pass and return a collection?

I use a list of data access objects mapped to the db tables.

I'm not sure what language you're using, but in general, there are tradeoffs of simplicity vs extensibility.
If you return the DataSet directly, you have now coupled yourself to database specific classes. This leaves little room for extension - what if you allow access to files or to other types of data sources? But, it is also very simple. This is the recordset pattern and C#/VB provide a lot of built-in support for this. The GUI layer can access the recordset and easily manipulate the data. This works well for simple applications.
On the other hand, you can wrap the datasets in a custom object, and provide gateway methods (see the Gateway pattern http://martinfowler.com/eaaCatalog/gateway.html). This method is more complex, but provides a lot more extensibility. In a larger application when you need to separate the the business logic, data logic, and GUI logic, this is a more robust way to go.
For larger enterprise applications, you can look into using Object Relational Mapping tools (ORM). They help to automatically map java objects to database tables. They hide a lot of the painful SQL details. Frameworks such as Spring provide excellent support for ORMs.

I tend to use arrays of objects, so that I can disconnect the DAO from the business logic.
You can store the data in the DAO as a dataset, for example, and give them an easy way to add to the database before doing an update, so they can pass in information to do modification operations, and then when they want to commit the changes they can do it in one shot.
I prefer that the user can't add/modify the structure themselves, as it makes it harder to determine what must be changed in the database.
By initially returning an array they can then display what is in the database.
Then, as the presentation layer makes changes, the DAO can be updated by the controller. By having a loose coupling the entire system becomes more flexible, as you can change the DAO from a dataset to something else, and the rest of the application doesn't care.

There are two choices that are the most generic.
The first way to look at a ResultSet is as a List of Maps, where each Map represents a row in the ResultSet. The keys are the columns listed in the FROM clause; the values are the database values.
The second way to look at a ResultSet is as a Map of Lists, where each List represents a column in the ResultSet. The Map keys are the columns listed in the FROM clause; the values are the List of database values.
If you don't want to do full-blown ORM, these can carry you a long way.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio