When and how should SpringData + JPA schema and DB initialization be used? - spring-boot

I'm working on a simple task of adding a new table to an existing SQL DB and wiring it into a SpringBoot API with SpringData.
I would typically start by defining the DB table directly, creating PK and FK, etc and then creating the Java bean that represents it, but am curious about using the SpringData initialization feature.
I am wondering when and where Spring Data + JPAs schema generation and DB initialization may be useful. There are many tutorials on how it can be implemented, but when and why are not as clear to me.
For example:
Should I convert my existing lower environment DBs (hand coded) to initialized automatically? If so, by dropping the existing tables and allowing the App to execute DDL?
Should this feature be relied on at all in production envrionment?
Should generation or initialization be run only once? Some tutorial mention this process running continually, but why would you choose to lose data that often?
What is the purpose of the drop-and-create jpa action? Why would
you ever want to drop tables? How are things like UAT test data handled?

My two cents on these topics:
Most people may say that you should not rely on automated database creation because it is a core concept of your application and you might want to take over the task so that you can lnowmfor sure what is really happening. I tend to agree with them. Unless it is a POC os something not production critical, I would prefer to define the database details myself.
In my opinion no.
This might be ok on environments that are non-productive. Or on early and exploratory developments. Definetely not on production.
On a POC or on early and exploratory developments this is ok. In any other case I see this being useful. Test data might also be part of the initial setup of the database. Spring allows you to do that by defining an SQL script inserting data to the database on startup.
Bottomline in my opinion you should not rely on this feature on Production. Instead you might want to take a look at liquibase or flyway (nice article comparing both https://dzone.com/articles/flyway-vs-liquibase), which are fully fledged database migration tools on which you can rely even on production.

My opinion in short:
No, don't rely on Auto DDL. It can be a handy feature in development but should never be used in production. And be careful, it will change your database whenever you change something on your entities.
But, and this is why I answer, there is a possibility to have hibernate write the SQL in a file instead of executing it. This gives you the ability to make use of the feature but still control how your database is changed. I frequently use this to generate scripts I then use as blueprint for my own liquibase migration scripts.
This way you can initially implement an entity in the code and run the application, which generates the hibernate sql file containing the create table statement for your newly added entity. Now you don't have to write all those column names and types for the database table yourself.
To achieve this, add following properties to your application.properties:
spring.jpa.hibernate.ddl-auto=none
spring.jpa.properties.javax.persistence.schema-generation.scripts.create-target=build/generated_scripts/hibernate_schema.sql
spring.jpa.properties.javax.persistence.schema-generation.scripts.action=create
This will generate the SQL script in your project folder within build/generated_scripts/hibernate_schema.sql
I know this is not exactly what you were asking for but I thought this could be a nice hint on how to use Auto DDL in a safer way.

Related

How do I integrate Liquibase within an existing CI/CD pipeline in large organization?

We are working in a very big organization, many Databases (of many types), many schemas, many users.
Does LB has to work with some Source Control (for locking the files
when many users exist in the organization and using the same DB,
same Schema, etc)?
What is the best practice of working with LB in a very big
organization, many concurrent users?
Can SQLCL general sql format type or just xml format type?
Is there some integration with SQL Developer? I mean, suppose a user
changes an objects via sql developer, what happens then?
We get this type of question all the time, after folks get a handle of how to automate DB changes, next step is typically to add it into an existing CI/CD workflow.
Yes, Liquibase works with any source control. Most users are using
Git. But you can use Git, TFS, SVN, CVS... Once you are up and
running with Liquibase, you just need to make sure that your scripts
are in source control and you are good to go.
Besides 3rd party source control tools, Liquibase has tracking tables called "DATABASECHANGELOG" tables that keep track of the changes applied to your database when using Liquibase deployments.
Here is some more information about getting started and How Liquibase Works. https://www.liquibase.org/get_started/how-lb-works.html
Liquibase has one more table that it uses internally called "DATABASECHANGELOGLOCK" table.
This table was designed to prevent multiple Liquibase users running deployments concurrently - potentially leaving the Database in a bad state. Once the Liquibase deployment (the liquibase update command) is done, the "DATABASECHANGELOGLOCK" will allow the next Liquibase user to deploy.
You can use both SQL and XML formats (or even JSON and YAML formats).
When using SQL, you have a few options:
Best option is to use Formatted SQL changeLogs https://www.liquibase.org/documentation/sql_format.html
https://www.liquibase.org/get_started/quickstart_sql.html
You can use plain raw SQL files referenced from an XML changeLog
https://www.liquibase.org/documentation/changes/sql_file.html
When using XML, can find all the available change types (also called changeSets) available in the following page (on the left of the page)
https://www.liquibase.org/documentation/changes/
XML changeLog are more agnostic and sometimes can be used for different Database platforms when doing migrations. Also, many of the change types in XML have the ability to be rolled back automatically. The reason that this is possible with XML is because Liquibase uses it own built in functions to figure out inverse statements like "create table" to be "drop table".
For each of those changeSets you can find out if they are auto rollback eligible (at the bottom of the page). For example, create table changeSet will be Auto Rollback = yes.
https://www.liquibase.org/documentation/changes/create_table.html

Targeting multiple database types with generated JOOQ code

I imagine that it nowadays is quite common to use one RDBMS during development and another RDBMS in production. I'd like to use H2 in development and MariaDB in production for a Spring Boot and JOOQ based application.
Is there some clever way to make the same generated JOOQ code work in both development and production environments, or do I need to generate two sets of code depending on the target environment? If the latter is true, how to do that in a sane way e.g. using the nu.studer.jooq gradle plugin?
Exceptions like this are thrown whenever I try to use the sources generated from a H2 database against a MariaDB server:
org.mariadb.jdbc.internal.util.dao.QueryException: SELECT command denied to user 'foo'#'localhost' for table 'FOO'
Query is: select `PUBLIC`.`FOO`.`ID`, `PUBLIC`.`FOO`.`NAME`, `PUBLIC`.`FOO`.`INFO` from `PUBLIC`.`FOO`
I use the same flyway initialization/migration scripts for both H2 and MariaDB.
You don't need to generate two sets of classes for each production environment. jOOQ's generated classes are pretty vendor agnostic, unless you use vendor specific features, e.g. like MariaDB's enum type or stored procedures, etc.
The error you're getting is probably related to one of these things:
You might not have a PUBLIC schema in your MariaDB database. You can either make sure the schema names match between H2 and MariaDB, or you can turn off schema name generation in jOOQ by using either Settings.renderSchema on your configuration, or by using a schema mapping.
Different databases have different default case sensitivity settings. In H2, by default, all tables are upper case, but this might not be the case in your MariaDB installation. You can either make sure the casing is the same in both databases, or you turn off the generation of the backticks / quotes. This can be done with Settings.renderNameStyle, setting it to AS_IS
It might be unrelated to jOOQ and you simply don't have the appropriate privilege to query the table.
Unrelated, a short note on using different vendors for development and production
You said:
I imagine that it nowadays is quite common to use one RDBMS during development and another RDBMS in production. I'd like to use H2 in development and MariaDB in production for a Spring Boot and JOOQ based application.
I really really advise against this practice. You can very easily set up your production database in docker and work directly against it. While H2 can emulate a couple of MariaDB features, it is nowhere near the same. By artificially restricting yourself to the least common denominator between H2 and MariaDB, you're missing out on a lot of cool MariaDB features, including CTE, window functions, stored procedures, etc. etc. And you will constantly fight the subtle differences between the vendors on various levels of your stack.
You should only do this when:
You actually need to support several databases in production
You really really really benefit from the slightly increased performance, e.g. for integration testing (but I doubt it, with docker).

Using liquibase versioning table definitions, not change sets

I'd like to have my version only the latest table definition in my repository, (no change sets), and have liquibase figure out which changes are needed when patching my databases. Please take note that I have a very big database schema (1000+ tables) installed in hundreds of customer sites, with different versions each one, and I really don't know which objects each version has
How can I make a liquibase-based installer for my application, given my set of table definitions, and hundreds of databases with about 12 different versions of objects on each one?
To be more specific, I'd like liquibase to compare my table definitions with the production database, and emit the alter table statements required to make the database current with my latest version.
I could contribute code if necessary in order to get this done
Liquibase and tools like it (for example flyway) are primarily designed to support database migrations. A migration is where every change to the DB is tracked so that it can be replayed on target environments thereby keeping them in sync with development (although time-shifted). It's all about keeping your schema under revision control.
Your use case is a little different. If I understand correctly you're trying to retrofit Liquibase onto a series of environments that you are not 100% certain match your application's current schema?
I would only recommend migration tools like liquibase if you intend to use them going forwards. If all you want is a DB diff tool, I would suggest you look elsewhere.
To perform an initial sync then I would suggest you investigate the diffLog command, coupled with changeLogSync command to initialize liquibase on the target DB.
comparing databases and genrating sql script using liquibase

Nhibernate Nunit - clear database between testcases

We have a rather extensive test suite that takes forever to execute.
After each test has completed, the database (MSSQL) needs to be emptied so it is fresh for the next testcase.
The way we do this is by temporarily removing all foreign keys, TRUNCATE'ing all tables, and re-adding the FKs.
This step takes somewhere between 2-3 seconds, according to NHProfiler. All the time is seemingly spent with the FK operations.
Our current method is clearly not optimal, but which way should we go to improve the performance ? The number of elements which are actually deleted from the DB is completely insignificant compared to the number of operations for the FK removal/additions.
Using an in-memory SQLite database is not an option, as the code under test uses MSSQL specific operations.
You could wrap everything in a transaction and in the end just rollback everything. That's how I do it. It allows also to run tests in parallel.
what about using SQL Server Compact, create the database from the mapping files using nhibernate schema create and load the data for each test. if you are talking about a trivial amount data.
Have a look at this blog post - Using SQL Server Compact Edition for Unit testing
Alternativly you could use Fluent Migrator to create the database schema and load the data for each test.
Why are you even using a DB in your tests? Surely you should be mocking the persistence mechanism? Unless you're actually trying to test that part of the functionality you're wasting time and resources actually inserting/updating/deleting data.
The fact that your tests rely on ms sql specifics and returned data hints at the possibility that your architecture needs looking at.
I'm not meaning to sound rude here - I'm just surprised no one else has picked you up on this.
w://
There are a couple of things that I've done in the past to help speed up database integration tests. First thing I did was I ended up having a sql script that actually creates the entire database from scratch. This can be easily accomplished using a tool like Red-Gate SQL Compare against a blank database.
Second I created a script that removed all of the database objects from an existing database.
Then I needed a script that populated the database with test data. Again, simple to create using Red-Gate tools. You don't need/want a ton of data here, just enough to cover your test cases.
With those items in place, I created one test class with all of my read-only operations in there. In the init of that class, i cleared a local sql server express instance, ran the create script and then ran the populate script. This ensured the database was initialized correctly for all of the read-only tests.
For tests that actually manipulate the database, we just did the same routing as above except that we did it on test init as opposed to class init.
Obviously the more database manipulation tests you have, the longer it will take to run all of your tests. If it becomes unruly, you should look at categorizing your tests and only running what is necessary locally and running the full suite on a continuous integration server.

Migrating and Backing up Schemas (complex database structures)

Hey guys,
I need to figure out a way to back up and also migrate our Oracle database from our production schema to the dev schema and the other way around.
We have bunch of config tables that drive how systems on our platform run, and when setting up new systems or doing maintenance, we need to update our config tables. We want to be able to work on the dev schemas and after setting up a system/feature, we want to be able to migrate all those configs to the dev schemas.
I thought of running a procedure where we give the ID of the system (from the main table) and i would go through all the tables and select nvl(..) and if it doesn't exist, i would insert into, and if it does exist then i just run an update on that row.
This code will get very messy and complicated especially since the whole config schema is very complex and it might be hard to handle all the keys properly.
Another option i was looking at was triggers, so when setting up a new system, there would be a log of all the statements we ran while setting up/editing a system, then we would run it on our production schema.
I'm on a coop term, and have only been working with databases for 6 months, so i don't know that much and any information/advice would be greatly appericiated.
(We use pl/sql)
What about using export / import (or datapump) to bring over the config tables?
Check out data comparison tools like this
Think TOAD has one built in. I'm sure there are others out there too.
It is common to have tables in a schema that are what we call "static data", i.e. the users don't change it because it controls how the application works.
Each change to config data should not be run ad-hoc in the target environment. Instead, you design and code your DML carefully in one or more scripts, which get tested in a dev environment, checked into change control, and can be re-run in any environment when required.

Resources