Is using DDL statements in ETL script right approach? - etl

I'm working on redesigning DWH solution previously based on Teradata with lots of BTEQ scripts performing transformations on mirror tables loaded from source DBs. New solutions will be based on Snowflake and as a transformation tool set of SQL (Snowflake) scripts is being prepared.
Is that a right approach to use in ETL scripts DDL statements which create e.g. temporary table which than et the end of script is dropped?
In my opinion such table should be created before running this script instead of creating it in the script on fly. One argument opts that DDL statements on Snowflake commits transaction and that's why I want to avoid DDL statements in transformation scripts. Please help me finding pros and cons of using DDL statements in ETL process and back me up that I'm right or convince that I'm wrong.

If you are wanting transactions to cover all your SELECT/INSERT/MERGE steps of your transformation step of your ELT, you need to not create/drop any tables, as those will commit your open transactions.
We get around this by have pre-existing worker tables per task/deployment that are create/truncated prior to the transaction section of our ELT process. And our tooling does not allow a task to run at the same time.
Thus we load into a landing table, we transform into temporary tables, then we multi-table merge into the final tables. With only the last steps needing to be in transactions.

Related

Oracle Advanced Queues versus a Small Oracle Database Table

I'm looking for a simple way to communicate between two databases, there currently exists a database link between both database.
I want to process a job on database 1 for a batch of records (batch code for each batch of records), once the process has finished on database 1 and all the batches of records have been processed. I want database 2 to see that database 1 has processed a number of batches (batch codes) either by querying a oracle table or an Oracle advanced queue which sits on either database 1 or database 2.
Database 2 will process the batches of records that are on database 1 through a database linked view using each batch code and update the status of that batch to complete.
I want to be able to update the Oracle Advanced Queue or database table of its batch no, progress status ('S' started, 'C' completed), status date
Table name.
batch_records
Table columns
Batch No,
Status,
status date
Questions:
Can this be done by a simple database table rather than a complex Oracle Advanced Queue?
Can a table be updated over a database link?
Are there any examples of this?
To answer your question first:
yes, I believe so
yes, it can. But, if there are many rows involved, it can be pretty slow
probably
Database link is the way to communicate between two databases. If those jobs run on the database 1 (DB1), I'd suggest you to keep it there - in the DB1. Doing stuff over a database link calls for problems of different kinds. Might be slow, you can't do everything over the database link (LOBs, for example). One option is to schedule a job (using DBMS_SCHEDULER or DBMS_JOB (which is quite OK for simple things)). Let the procedure maintain job status in some table (that would be a "simple table" from your 1st question) in DB1 which will be read by the DB2.
How? Do it directly, or create a materialized view which will be refreshed in a scheduled manner (e.g. every morning at 07:00) or on demand (not that good idea) or on commit (once the DB1 procedure does the job and commits changes, materialized view will be refreshed).
If there aren't that many rows involved, I'd probably read the DB1 status table directly, and think of other options later (if necessary).

Dynamic Audit Trigger

I want to keep logs of all tables into 1 single log table. Suppose if any DML operation is going on any table inside DB. Than that should be logged in 1 single tables.
But there should be a dynamic trigger which will not hard coded the column names for every table.
Is there any solution for this.
Regards,
Somdutt Harihar
"Is there any solution for this"
No. This is not how databases work. Strongly enforced data structures is what they do, and that applies to audit tables just as much as transaction tables.
The reason is quite clear: the time you save not writing audit code specific to each transactional table is the time you will spend writing a query to retrieve the audit records. The difference is, when you're trying to get the audit records out you will have your boss standing over your shoulder demanding to know when you can tell them what happened to the payroll records last month. Or asking how long it will take you to produce that report for the regulators, are you trying to make the company look like a bunch of clowns? You get the picture. This is not where you want to be.
Also, the performance of a single table to store all the changes to all the tables in the database? That is going to be so slow, you have no idea.
The point is, we can generate the auditing code. It is easy to write some SQL which interrogates the data dictionary and produces DDL for the target tables and triggers to populate those tables.
In fact it gets even easier in 11.2.0.4 and later because we can use FLASHBACK DATA ARCHIVE (formerly Oracle Total Recall) to build and maintain such journalling functionality automatically, and query it automatically with the as of syntax. Find out more.
Okay, so technically there is a solution. You could have a trigger on each table which executes some dynamic PL/SQL to interrogate the data dictionary and assembles a piece of JSON which you stuff into your single table. The single table could be partitioned by day range and sub-partitioned by table name (assuming you have licensed the Partitioning option) to mitigate the performance of querying it.
But that is extremely complex. Running dynamic PL/SQL for every DML statement will have a bad effect on performance, which the users will notice. And this still doesn't solve the fundamental problem of retrieving the audit trail when you need it.
To audit DML actions on any table just enable such audit by using following code:
audit insert table, update table, delete table;
All actions with tables will then be logged to sys.dba_audit_object table.
Audit will only log timestamp, user, host and other params, not exact copies of new or old rows.

How to update/insert a table without creating a new table (temporary or otherwise)

Background: My team has an etl job that updates an aggregate table. Each row contains data for a particular date, but this row can and will get updated after the row date (which means any row can contain data from multiple jobs). This ETL job missed some data for one day last week and now I need to backfill it.
Problem: I have the missing data, and what I was planning on doing was dumping that data into a temporary table and then merging it with the agg table. That way I can deal with whether the ETL job already contains a row for that data (update) or whether a new row needs to be added (insert), but I don't have sufficient permissions to create a temp table, and I'd prefer not to involve the DBA.
Question: Can I do an insert/update sort of behavior without creating a temporary table (this is Oracle SQL by the way).
Edit: The data is coming from a tsv file.
Why do you want to avoid involving the DBA? The DBA should have full knowledge of what's going on in the database, as they are ultimately responsible for the condition of the data within it. So you shouldn't be playing sneaky commando with them.
As you have a file of missing data, the easiest way to present it to the database is with an external table. This requires the creation of the table and probably a directory object as well. You will need the DBA's help with this task.
The only way to avoid creating database objects is to convert your TSV file into a series of DML statements. An IDE which supports regex and/or records macros will prove invaluable here. I like TextPad; other editors are available.
The DML statement for doing upserts in Oracle is the MERGE statement. The one thing you need to watch for is recency. Your missing data comes from last week. If a row exists it may have have been added or amended in the intervening period. You must write your MERGE statement so it does not overwrite more recent data with the older stuff. Hopefully your table has useful metadata columns such as DATE_CREATED and LAST_UPDATED.

Creating re-runnable Oracle DDL SQL Scripts

Our development team does all of their development on their local machines, databases included. When we make changes to schema's we save the SQL to a file that is then sent to the version control system (If there is a better practice for this I'd be open to hearing about that as well).
When working on SQL Server we'd wrap our updates around "if exists" statements to make them re-runnable. I am not working on an Oracle 10g project and I can't find any oracle functions that do the same thing. I was able to find this thread on dbaforums.org but the answer here seems a bit kludgy.
I am assuming this is for some sort of Automating the Build process and redoing the build from scratch if something fails.
As Shannon pointed out, PL/SQL objects such as Procedures, functions and Packages have the "create or replace" option, so a second recompile/re-run would be ok. Grants should be fine too.
As for Table creations and DDLs, you could take one of the following approaches.
1) Do not add any drop commands to the scripts and ask your development team to come up with the revert-script for the individual modules.
So for each create table that they add to the build, they will have an equivalent "DROP TABLE.." added to a script say."build_rollback.sql". If your build fails , you can run this script before running the build from scratch.
2)The second (and most frequently used approach I have seen) is to include the DROP table just before the create table statement and then Ignore the"Table or view does not exist" errors in the build log. Something like..
DROP TABLE EMP;
CREATE TABLE EMP (
.......
.......
);
The thread you posted has a major flaw. The most important one is that you always create tables incrementally. Eg, your database already has 100 tables and you are adding 5 more as part of this release. The script spools the DROP Create for all 100 tables and then executes it which does not make a lot of sense (unless you are building your database for the first time).
An SQL*Plus script will continue past errors unless otherwise configured to.
So you could have all of your scripts use :
DROP TABLE TABLE_1;
CREATE TABLE TABLE_1 (...
This is an option in PowerDesigner, I know.
Another choice would be to write a PL/SQL script which scrubs a schema, iterating over all existing tables, views, packages, procedures, functions, sequences, and synonyms in the schema, issuing the proper DDL statement to drop them.
I'd consider decomposing the SQL to create the database; one giant script containing everything for the schema sounds murderous to maintain in a shared environment. Dividing at a Schema / Object Type / Name level might be prudent, keeping fully dependent object types (like Tables and Indexes) together.

Script Oracle tables (DDL) with data insert statements into single/multiple sql files

I am needing to export the tables for a given schema, into DDL scripts and Insert statements - and have it scripted such that, the order of dependencies/constraints is maintained.
I came across this article suggesting how to archive the database with data - http://www.dba-oracle.com/t_archiving_data_in_file_structures.htm - not sure if the article is applicable for oracle 10g/11g.
I have seen "export table with data" features in "Sql Developer", "Toad for Oracle", "DreamCoder for Oracle" etc, but i would need to do this one table at a time, and will still need to figure out the right order of script execution manually.
Are there any tools/scripts that can utilize oracle metadata and generate DDL script with data?
Note that some of the tables have CLOB datatype columns - so the tool/script would need to be able to handle these columns.
P.S. I am needing something similar to the "Generate Scripts" feature in SQL Server 2008, where one can specify "script data" option and get back a self-sufficient script with DDL and data, generated in the order of table constraints. Please see: http://www.kodyaz.com/articles/sql-server-script-data-with-generate-script-wizard.aspx
Thanks for your help!
Firstly, recognise that this isn't necessarily possible. A view can use a function in a package that also selects from the view. Another issue is that you might need to load data into tables and then apply constraints, even though this might be slower than the other way round.
In short, you will need to do some work here.
Work out the dependencies in your system. ALL_DEPENDENCIES is the primary mechanism.
Then use DBMS_METADATA.GET_DDL to extract the DDL statements. For small data volumes, I'd extract the constraints separately for applying after the data load.
In current versions you can create external tables to unload data from regular tables into OS files (and obviously go the other way round). But if you've got exotic datatypes (BLOB, RAW, XMLTYPEs, User Defined Types....) it will be more challenging.
I suggest that you use Oracle standard export and import (exp/imp) here, is there a reason why you won't consider it? Note in addition you can use the "indexfile" option on the import to output the SQL statements (unfortunately this doesn't include the inserts) to a file instead of actually executing them.

Resources