Debugging Delta table source in Databricks - azure-databricks

Suppose in my project there are lots of delta tables and i want to search that any particular delta table in databricks is loaded from which source. How can i search which script or source is loading that particular delta table in efficient manner in databricks workspace?
Looking for help in databricks workspace and expecting result related to debgging some table like what are the source which notebook it is loading etc.

It's possible with Databricks Unity Catalog and Lineage functionality inside it - it will give you ability to track what data sources were used to build a specific table, and what notebooks, etc. are reading from it or writing to it.
Without Unity Catalog it's harder - you either need to stick something to use for example, OpenLineage presentation, and instructions on how to setup it with Purview.

Related

Is there a way to recreate an ODI package using ODI Scenario?

I mistakenly deleted an ODI package from my project which is very large in size. Is there a way to recreate the same package if I have a previously exported scenario for the same project?
Unfortunately there isn't any way to directly generate a deleted package from a scenario which you can see as a compiled version of the package.
Here are a couple of things to check to see if you can retrieve some information :
When promoting from one environment to another, it's recommended to export the base object along with the scenario. That way you keep track of what was the code behind that scenario (black box). Bonus points if it's all versioned in a version control system (VCS).
Starting with ODI 12.2.1, VCS can be directly integrated within ODI to version your code and create release archives. If that's the case you can restore a deleted object
Before 12.2.1 it was possible to use the internal versioning system of ODI that would store the objects in the Master Repository. You can restore it from the top menu.
If nothing of that is setup, you can still open the scenario export in any file editor and manually go through it to retrieve the logic. It's just an XML files that describes the different steps of your package. That would help you to rebuild it manually.
If you end up doing that last bullet point, now is probably a good time to improve your existing procedures and setup one of the three backup/versioning solutions mentioned above so it doesn't happen in the future.

Oracle SQLDeveloper Database Diff functionality doesn't consider dependencies

I'm working on creating a deploy script to migrate new development from our dev server to our uat server. Unfortunately, the devs that made the changes didn't script them out at they coded. The easiest way for me to approach this is to use SqlDeveloper's database diff functionality. It does a good job of highlighting the differences and creating a script that I can run on UAT. However, I've noticed that it doesn't take into account any dependencies. For example, a it will put a command to create a table below a table that references it in a foreign key constraint. Because the referenced table doesn't exist yet, the first table create command fails. I've seen this with views referencing packages, packages referencing packages, etc. Is there any easy to way either 1) force sql to export in a "smarter" order or 2) manually calculate the dependencies (ex: querying USER_DEPENDENCIES, etc) so that i can manually sort the file of create commands without resorting to trial and error? I guess we could consider purchasing a commercial product as long as it matched exactly what we are looking for.
Note: we will probably have to deploy to UAT multiple times in order to support testing by end-users. I am trying to automate this as much as possible so I don't have to manually recreate this script every single time!
Thanks!

Allow (once) column drop in database project

I would like to drop one column in existing table. When I simply remove it from table's create script it will cause error (data loss...) on deployment. I would like to allow (in this case) column drop. How you would do that?
To disable the data loss error:
Click the Options icon in your schema comparison file.
Uncheck "Block on possible data loss".
The setting will change for just that 1 schema comparison and it will be saved within the schema comparison file. If you only want to do this once then you'll need to re-enable the option after you drop the column.
We did this by creating a PreDeployment script to drop the column. Reasoning is we do not want to allow data loss for all objects in the database.
You can create automated version checks to do this only once (see my answer to another post Nontrivial incremental change deployment with Visual Studio database projects for steps how to automate this with SSDT.)
Or you can just supply the script to devops and include instructions in your install manual to run it once for a specific release.
After the release has gone live, you can delete the PreDeployment script.

Get cdc tables in visual studio 2012/2010 database project

I am trying to create a database project in Visual Studio 2012/2010 where I need the CDC (Change Data Capture) tables , because lot of my views are dependent on the CDC tables. I couldn't find any way to import the cdc schema/tables :(. Read in many blogs that importing cdc is not supported. Is there any work around. Please suggest
Generally you wouldn't really want the CDC tables to be created by a database project, you want them to be created using sys.sp_cdc_enable_table
If you allow the database project to create the tables in a normal manner then the CDC tables would end up existing but their Change Data Capture wouldn't actually be enabled.
Obviously you can script the calls to sys.sp_cdc_enable_table in either Pre or Post scripts, but as far as I can tell neither place is ideal.
If you put the sys.sp_cdc_enable_table calls in a Pre script changes are that not all the original tables exist (on a fresh deploy none of them will exist), or that these original tables will change shape as part of the main deploy that occurs after Pre is run.
If you put the sys.sp_cdc_enable_table calls in the Post script, you can't have Views that rely on the CDC tables existing deployed as part of the main database project deployment (not without errors or warnings in your DB project).
I would suggest not having too many views, functions or stored procs that rely on the existence of the CDC tables, but it sounds like it might be too late for that.
Side Note: Generally speaking you should be using the cdc.fn_cdc_get_all_changes_ and cdc.fn_cdc_get_net_changes_ functions rather than referencing the cdc tables directly, see Querying Change Data Capture data
However, that just moves the problem along one level to those functions not existing in your project.
Unfortunately, there doesn't seem to be a good way to have those extra views without scripting them at the same point that you script your calls to sys.sp_cdc_enable_table
(I'm honestly hoping someone else will come along with a better answer that actually solves the problem)
We came across the same issue recently in a DB project using TFS/VS for builds during the implementation stage.
As Scott correctly said, it's not a good idea to attempt to implement either cdc schema or any objects inside the project/solution, this has unfortunate consequences which you really do not want to experience!
For TFS/VS projects, you should implement a Pre-PostDeployment Script strategy, whereby CDC can be disabled during the PreDeploy process and then the instances are recreated during the PostDepolyment process.
In this way you can assured that the correct CDC instances are created uniformly.
When you consider the dependent views, again, the strategy is not difficult.
Create the view as you normally would in the project, but as a placeholder (e.g. a simple CREATE dbo.vMyView AS SELECT 1 FROM SomeTable). In the PostDeployment scripts, add a further script that executes AFTER the CDC instances have been created, with ALTER statements for those views (e.g. ALTER VIEW dbo.vMyView SELECT Col1 FROM cdc.MyCDC_CT). Remember, a view, once created, can exists even if the underlying table doesn't.

VS2008 DataSet Wizard doesn't match tables for updating

first question ever on this site.
I've been having a real stubborn problem using Visual Studio 2008 and I'm hoping someone has figured this out before.
I have 2 libraries and 1 project that use strongly typed datasets (MSSQL backend) that I generated using the "Configure DataSet with Wizard" option on in Data Sources. I've had them working just fine for awhile and I've written a lot of code in the non-designer file for the row classes. I've also specified a lot of custom queries using the dataset designer. This is all work I can't afford to loose.
I've recently made some changes to re-organize my libraries which included changing the names of the libraries themselves. I also changed the connection string to point to a different database which is a development copy (same exact schema).
Problem is now when I open up "Configure DataSet with Wizard" to pickup a new column I've added to one of the tables it no longer matches the tables correctly in the wizard. The wizard displays all of the tables in the database and none of them have check boxes next to them (ie: are not part of this dataset). Below those it shows all of the tables again but with red Xs and these are checked. Basically meaning that Visual Studio sees all of the tables it currently has in the DataSet and sees all of the tables in the database, but believes they are no longer the same and thus do not match!
I've had this same thing happen quite awhile back and I think I just re-built the xsd from scratch and manually copied the code over and then had to redefine all of the custom queries I built in the dataset designer. That's not a good solution.
I'm looking for 2 answers:
1. What causes this to happen and how to prevent it.
2. How do I fix this so that the wizard once again believes the tables in its xsd are the same tables that are in the database (yes, they have the exact same names still).
Thanks.
The dataset designer uses the default query (The first one with a check on it) to sync up the schema for each table. Whenever you go to edit the default query, VS will actually connect to your datasource and look for changes in the query. If new columns are added, they will show up as new columns for you to add to your table. Renamed columns show up as new, since VS doesn't have any way to know that you changed the name.
Answer 1. The XSD file contains the names of the database tables that it used to create the table originally. If you change the name of the table, the designer won't know which table to sync to.
Answer 2. You can edit the XML inside the XSD file. Do a "Find and Replace" inside the XSD file replacing the old tablename for the new tablename. Make sure you have a backup of the XSD file before you do. Be careful to only change instances of the old table name and not any other working XML.

Resources