dbt schema cleanup in Snowflake - continuous-integration

When we create Pull Requests in GitHub it auto triggers a dbt cloud job that runs a test build of our models. The database in Snowflake for this build is called "Continuous Integration". In this database we have hundreds of schemas going back almost 2 years. Is there any reason to keep these schemas and tables? I sure would like to do some cleanup.

You should be able to delete these old schemas with no consequence.
Each of these schemas is built based on the change introduced in an earlier version of the code & (depending on how you set up your github action) either using a pre-defined test data or the raw data available at the time of the test begin run.
These CI jobs can serve two use-cases.
[primary] test the code works & data validation tests pass
they can act as a way to do time travel, which I'll describe below.
The first use-case does not need the artifact to be preserved after once the job runs
The second use-case may be important to you in trying to debug reports that were generated many months ago.
Example: lets say the finance department wants to know why a historical value of active users has changed in the latest report. this may have been an error that was fixed within your dbt logic, or perhaps the active users was pulled with an incorrect filter from your BI layer, if you had dbt artifacts built from that era, you would be able to use it to look for any dbt level changes.
How far back do you think you'd need the artifacts for time travel? Check with your stakeholders and come up with a time frame that works for your business, and you can delete all the CI artifacts built prior to that date.

Related

How do you switch between standard and serverless configurations in Amazon Aurora

I am looking a this Amazon page - https://aws.amazon.com/rds/aurora/serverless/ and it has this quote:
You pay on a per-second basis for the database capacity you use when
the database is active, and migrate between standard and serverless
configurations with a few clicks in the AWS Management Console.
I have a few normal Aurora clusters and want to switch them to serverless. I have looked and looked and cannot find the "migrate with a few clicks" bit in the Amazon user interface. I made a new serverless cluster just fine and so I could do a stop, backup, and restore with a short outage - but If I can do this without an outage - that would be far superior.
So where are these "few clicks" - or perhaps you will tell me the "few clicks" means stop, backup, and restore. Either way I think a lot of folks could benefit from knowing what "few clicks" make this happen.
As a comment on #drchuck's approach - We've learned this the hard way that AWS Database Migration Service does a bad job at creating the schema in the target database. However - there's a simple workaround:
1) Run mysqldump --no-data to get the exact schema from the source database.
2) Execute the dump'd schema on the target database.
3) Within your DMS task, under target table preparation mode, choose "Truncate" instead of "Drop tables on target". (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.Creating.html)
With this in place, DMS doesn't create the schema on the target side, and things work pretty well (all existing data is loaded, and then ongoing changes are sync'd in near-real-time).
We've used this approach for minimal downtime cutovers a few times.
It took more than a while to figure out those few clicks.
I'm here initially as I too could not find them and yes I saw the exact quote on the AWS page you indicated saying that yes you could.
First you take a snapshot and then you restore it. In the process of restoring it you can select a serverless instance. (At least under SOME conditions. I do not think that a 5.7.12 (just confirmed actually) can be restored to a serverless configuration).
I suspect that 5.7.12 will happen in due time.
Right now the magic bullet is to start with a 5.6.10a version, take a snapshot and then restore that to a serveless instance.
For what it's worth after the long time:
Apparently Amazon Aurora Serverless is only compatible with MySQL 5.6 - this explains why 5.7 snapshots cannot be recovered.
So the two options are
downgrading the MySQL version to 5.6 first or
dumping and importing the data (after I read the other answers, I'd go for the second option).
Further reading:
https://aws.amazon.com/rds/aurora/serverless/?nc1=h_ls#How_to_Get_Started
When I did not get an answer in a few days, I did the conversion two ways with different results so I figured I would share my results here. I would still love to hear a better approach. (1) When I did the conversion using mysqldump and restore, with a short outage things were fine. (2) When I used AWS Database Migration Service it went pretty badly.
First, you have to get the binary log format as "ROW" and retention to 24 hours. That necessitated server restarts on my old clusters. Then when the data migration worked, I lost all my auto increments, then NULLness in my columns, the UNIQUE clauses and foreign keys in the new tables. Literally the only thing that migrated correctly was that the actual data and PRIMARY KEY indications. Also, I would recommend migrating one database at a time (i.e. schema) and don't try to migrate the mysql internal schemas. I said "migrate everything" and the migration tool tried to migrate the MySQL stuff - sheesh.
The one thing the AWS Database Migration Service did that was really cool was the migrate and monitor (made possible by the binary logging on the rows). You could watch it moving rows.
Just for the record, AWS amended the quoted documentation in mid-2022 by changing 'few clicks' to 'few steps'.🤣
You pay on a per-second basis for the database capacity that you use
when the database is active, and migrate between standard and
serverless configurations with a few steps in the Amazon Relational
Database Service (Amazon RDS) console.
Currently the documentation states that there are two (multi-step) methods that can be used to migrate from provisioned to serverless, and serverless to provisioned:
Snapshot restore.
Logical backup and restore.
Details here.

How to setup SonarQube for a large organization

I am in the process of setting up SQ for a large organization. I plan to have two separate systems one for update testing and rule development. The second would be the production system where real work occurs. I will be using SQL 2014 typically when I do that I use a SQL always On group to sync to a DR server in another datacenter. My question is with a SonarQube instance does it make sense to DR the application to that level. If my organization can wait for a period of time to stand up a new server in a DR event would that be possible with a proper backup of the DB? Further if there were no backups of the DB what would be lost with a fresh new SonarQube server besides setup/config time? Is there historical value of code scans that would be lost or would the next scan of the code base have us right back to where we were in terms of critical issues found etc.?
Thanks for your replies.
All the data is stored in the database so using DR on the database is a good idea. You should make backup of the database and restoring the database is also a good solution (note that you should do backup of installed plugins).
If you loose the database, you will also loose all the configuration (quality profiles, credentials, etc.) and the history of the analyzed projects.
So to restore a SonarQube instance, you have to :
Restore the database
Restore SonarQube or install the same version
Restore the plugins (${SONAR_HOME}/extensions/plugins)
During the first start, the ES files (${SONAR_HOME}/data/es) will be regenerated and you're instance will then be up and running.
If you have commercial plugins or if you are working with large SonarQube instance you may contact the sales team to have support on this setup.
Disclaimer : I'm working at SonarSource

Why use Oracle version control?

At work we use Oracle (12c client) to store most of our data and I use SQL Developer to connect to the database environments.
Issue:
We have issues where tables are being modified for one reason or another (too lazy to create a new table so they add new columns and change data types or lengths). This in return will break the table for others who actually utilize it for its real purpose.
Update:
We have DEV, TST, UAT, and PRD environments. We test and have scripts approved before we promote to PRD. The problem resides in DEV when we want to go back to an existing table to make an change, but that table had already been modified for different reasons.
Question 1:
Is the versioning just for stored procedures or is it possible to track changes to table structures, functions, triggers, sequences, synonyms, etc.?
As Bob Jarvis indicates you need way more than a solution to your question. You need policies and practices enforced for all developers. Some ideas from places I have worked:
every developer has a VM machine with a copy of the database installed. They can do whatever they like on it but must supply scripts to move their changes to production. These scripts are applied on a test instance and again on a QA instance before going to production.
subversion works on all OS and tortoise works well on windows. Committing scripts to a repository works well and this is integrated with SQL developer and can be done with Toad.
you have a permissions issue. Too many people have the privileges to alter tables. Remove these permissions and centralize on one or two people. Changes are funnelled through them as scripts and oversight can be applied there. Developers can have their own schema to test or a VM with a copy for development.
run this script to see who can alter tables
select * from DBA_TAB_PRIVS
WHERE PRIVILEGE = 'ALTER'
The key is a separation of concerns. Developers should have access to a schema where they can do what they need. The company needs to know who did what, when and where.
If you have more than one developer working on multiple changes to a dev environment then you need coordination and communication as well as source control. A weekly meeting to discuss overlap areas or a heads up chat message are just some ways to work together.
The approach I think works best, is to have a DEV database where all the developers manage their own set of schemas.
Scripted builds are provided with test data loads to allow any developer to create his own working schema. He then works on there, tests his changes and then commits his changes via scripts to the source control. DEV databases do not need to be large, just need enough test cases to allow for unit tests.
Script all the changes so that they can be checked into a version control system, and merged with other changes. The goal is to have a system where devA checks in changeA, and then when merged with the main trunk, devB gets changeA as he builds his schemaA.
This approach requires care if the main project schema employs PUBLIC synonyms. You will need to consider this as you go forward.
I would also advise with each change checked in an accompanying back out script should be checked in.
The advantage of this approach is that devs can manage their own schemas. With a scripted approach they dont all need to have DBA knowledge, and don't need to manage the database either. having all these on one database makes it easier to manage and control resources.
I've used this approach in teams with 50+ developers and it has worked very well.
This approach also paves the way for having devs checking scripts in and having a automatically creating a deployment package.
There is so much that can be done to make the development-test-deploy-backout cycle easier to manage.

Run workflow on 1:N related records

I have developed an entity intended for staging imported data and doing some validation, as well as an entity that will be used as a batch reference, so that I can ensure all staged records in a batch will go into the system together. How can I run a workflow on all the related child records from the batch entity form?
Because standard CRM Workflow steps don't support the iteration of 1:N relationships, you need to build a custom workflow activity.
Lucky for us there is already one available:
CRM 2011 Distribute Workflow Activity
There is also a small tutorial (powerpoint format), the link is in the same page (last download link in the bottom)

Flyway migration in Development and Production

I searching for an way to do a different migration in production and development.
I want to create a Spring Webapplication with Maven.
In development i want to update database schema AND load test data.
In production when a new version of the application is deployed i want only change the schema and don't load test data.
My first idea was to save the schema update and insert statements into different folders.
I think every body has solved this problem and can help me, thank you very much.
Basically, you have two options:
You could use different locations for your migrations in your flyway.locations property, i.e.:
for Test
flyway.locations=sql/structure,sql/test
for Production
flyway.locations=sql/structure
That way, you include your test data in the sql/test folder. You would have to take care with numbering, of course.
The second option (the one I prefer), is don't include test data in your migrations at all.
Rather, create your testdata any way you want and create an sql-dump of this data, which you keep separate from your migrations.
This works best if you have a separate database (instance, schema, whatever) containing your pristine testdata, where you apply each migration as part of your build process. This build job could then create a dump always matching the current migration.
When preparing your test machine, you first apply your migrations, then you load the contents of the matching dump.
I think this is a lot cleaner than the first version, especially because your test data can be prepared using other tools (your application) and has not to be handcoded.

Resources