Is there a way to make the dbt_cloud_pr_xxxx_xxx a clone of an existing data? - continuous-integration

so using dbt cloud, and having a run on every pull request, but my incremental models are fully refreshed since everything runs in a new db destination (dbt_cloud_pr_xxxxx_xxx) any way of solving this? perhaps creating the new destination as a clone of an old one?

dbt calls this "Slim CI". You can use their "deferral" and "state comparison" features -- they will check the manifest of the compiled project to the manifest from another run you specify (typically the last production run). Any models that are unchanged will have ref() compile to the prod target, and then you can use the --state:modified+ selector in your dbt Cloud job definition to only rebuild the models with changes.
See the docs for CI in dbt Cloud.

Related

dbt schema cleanup in Snowflake

When we create Pull Requests in GitHub it auto triggers a dbt cloud job that runs a test build of our models. The database in Snowflake for this build is called "Continuous Integration". In this database we have hundreds of schemas going back almost 2 years. Is there any reason to keep these schemas and tables? I sure would like to do some cleanup.
You should be able to delete these old schemas with no consequence.
Each of these schemas is built based on the change introduced in an earlier version of the code & (depending on how you set up your github action) either using a pre-defined test data or the raw data available at the time of the test begin run.
These CI jobs can serve two use-cases.
[primary] test the code works & data validation tests pass
they can act as a way to do time travel, which I'll describe below.
The first use-case does not need the artifact to be preserved after once the job runs
The second use-case may be important to you in trying to debug reports that were generated many months ago.
Example: lets say the finance department wants to know why a historical value of active users has changed in the latest report. this may have been an error that was fixed within your dbt logic, or perhaps the active users was pulled with an incorrect filter from your BI layer, if you had dbt artifacts built from that era, you would be able to use it to look for any dbt level changes.
How far back do you think you'd need the artifacts for time travel? Check with your stakeholders and come up with a time frame that works for your business, and you can delete all the CI artifacts built prior to that date.

Using Cloud source repository in a GCP production project

I have a standalone cloud source repository, (not cloned from Github).
I am using this to automate deploying of ETL pipelines . So I am folowing Google recommended guidelines, i.e committing the ETL pipeline as a .py file.
The cloud build trigger associated with the Cloud source repository will run the code as mentioned in the cloudbuild.yaml file and put the resultant .py file on the composer DAG bucket.
Composer will pick up this DAG and run it .
Now my question is, how do I orchestrate the CICD in dev and prod? I did not find any proper documentation to do this. So as of now I am following manual approach. If my code passes in dev, I am committing the same to the prod repo. Is there a way to do this in a better way?
Cloud Build Triggers allow you to conditionally execute a cloudbuily.yaml file on various ways. Have you tried setting up a trigger that fires only on changes to a dev branch?
Further, you can add substitutions to your trigger and use them in the cloudbuild.yaml file to, for example, name the generated artifacts based on some aspect of the input event.
See: https://cloud.google.com/build/docs/configuring-builds/substitute-variable-values and https://cloud.google.com/build/docs/configuring-builds/use-bash-and-bindings-in-substitutions

Baseline existing database

Im looking at Sqitch and so far it seems like a great tool, however I have an existing project that I want to use it with, Is there a way to create a baseline?
For example, I take a backup of my schema then add it to the deploy script, I then want to run a command that will not run the this script on the database as it already exists, but would apply everything after this point?
I need the full base schema in there so that we can re-deploy the whole schema if required
You can use the --log-only option of sqitch deploy command
From the docs: https://sqitch.org/docs/manual/sqitch-deploy/
--log-only
Log the changes as if they were deployed, but without actually running the deploy scripts. Useful for an existing database that is being converted to Sqitch, and you need to log changes as deployed because they have been deployed by other means in the past.

gcloud automatic redeployment Golang app

I have a Golang app running on Google Cloud App Engine that I can update manually with "gcloud app deploy" but I cannot figure out how to schedule automatic redeployments. I'm assuming I have to use cron.yaml, but then I'm confused about what url to use. Basically it's just a web app with one main index.html page with changing content, and I would like to schedule automatic redeployments... how do I have to go about that?
If you want to automatically re-deploy your app when the code changes, you need what's called CI/CD (Continuous integration/deployment). What a CI does is, for each new commit to your repository, check out the new code and run a test script. If all the tests pass (or if you don't have any tests at all), the CI server can then deploy your code to App Engine, all automatically.
One free (for open-source projects) CI provider is Travis CI. To configure it, you need to make an account with Travis, and a file called .travis.yml in the root of your repository. To set up App Engine deploys, you can follow this guide to set up a service account and add the encrypted file to your repo. It will run a gcloud app deploy from a container on their servers, whenever you push code to a certain branch (master by default) in your repo.
Another option, which avoids setting up CI at all, is to simply change your app to generate the dynamic parts of the page when it gets requested. Reading the documentation for html/template would point you in the right direction.

Development versus Production in Parse.com

I want to understand how people are handing an update to a production app on the Parse.com platform. Here is the scenario that I am not sure about.
Create an called myApp_DEV. The app contains a database as well as associated cloud code.
Once testing is complete and ready for go-live I will clone this app into myApp_PRD (Production version). Cloning it will copy all the database as well as the cloud code.
So far so good.
Now 3 months down the line I want have added some functionality which includes adding some cloud code functions as well as adding some new columns to the tables in the db.
How do I update myApp_PRD with these new database structure. If i try to clone it from my DEV app it tells me the app all ready exists.
If I clone a new app (say myApp_PRD2) from DEV then all the data will be lost since the customer is all ready live.
Any ideas on how to handle this scenario?
Cloud code supports deploying to production and development environments.
You'll first need to link your production app to your existing cloud code. this can be done in the command line:
parse add production
When you're ready to release, it's a simple matter of:
parse deploy production
See the Parse Documentation for all the details.
As for the schema changes, I guess we just have to manually add all the new columns.

Resources