I have an use case where I want to update specific row, by any identifier/where clause conditions and update that record on Oracle or SQL Server from databricks.
As i use spark.read.format("jdbc") against any of the databases, I could not easily find a way to update specific rows back to these DBs.
If i use,
df.write.format("jdbc")
.option("url", dbServerJdbcUrl)
.option("user", username)
.option("secret", password)
.option("driver", <either com.microsoft.sqlserver.jdbc.SQLServerDriver or oracle.jdbc.driver.OracleDriver>)
.option("dbTable",<table on the database platform>)
.mode('overwrite') //or other options
.save()
it only overwrites the whole "dbTable" on the database. I could not find a way to have it work by using .option("query", "update statements") so far.
If i tend to write to another temp or parking table, then it becomes 2 stages of work, wherein, i have to go back to the Db platform and have the actual respective tables updated from the parking table.
Another note - when i do the above write, on a table which has millions of rows, and i only want to update handful of them, any of the modes are only just causing more trouble.
overwrite - simply makes the millions of rows to lose/overwritten by this handful of data from df.
append - either creates dupes or eventually failure due to constraints
Is there any better solution to have the databricks update the specific rows on a database?
Related
I want to keep logs of all tables into 1 single log table. Suppose if any DML operation is going on any table inside DB. Than that should be logged in 1 single tables.
But there should be a dynamic trigger which will not hard coded the column names for every table.
Is there any solution for this.
Regards,
Somdutt Harihar
"Is there any solution for this"
No. This is not how databases work. Strongly enforced data structures is what they do, and that applies to audit tables just as much as transaction tables.
The reason is quite clear: the time you save not writing audit code specific to each transactional table is the time you will spend writing a query to retrieve the audit records. The difference is, when you're trying to get the audit records out you will have your boss standing over your shoulder demanding to know when you can tell them what happened to the payroll records last month. Or asking how long it will take you to produce that report for the regulators, are you trying to make the company look like a bunch of clowns? You get the picture. This is not where you want to be.
Also, the performance of a single table to store all the changes to all the tables in the database? That is going to be so slow, you have no idea.
The point is, we can generate the auditing code. It is easy to write some SQL which interrogates the data dictionary and produces DDL for the target tables and triggers to populate those tables.
In fact it gets even easier in 11.2.0.4 and later because we can use FLASHBACK DATA ARCHIVE (formerly Oracle Total Recall) to build and maintain such journalling functionality automatically, and query it automatically with the as of syntax. Find out more.
Okay, so technically there is a solution. You could have a trigger on each table which executes some dynamic PL/SQL to interrogate the data dictionary and assembles a piece of JSON which you stuff into your single table. The single table could be partitioned by day range and sub-partitioned by table name (assuming you have licensed the Partitioning option) to mitigate the performance of querying it.
But that is extremely complex. Running dynamic PL/SQL for every DML statement will have a bad effect on performance, which the users will notice. And this still doesn't solve the fundamental problem of retrieving the audit trail when you need it.
To audit DML actions on any table just enable such audit by using following code:
audit insert table, update table, delete table;
All actions with tables will then be logged to sys.dba_audit_object table.
Audit will only log timestamp, user, host and other params, not exact copies of new or old rows.
I have the following scenario and need to solve it in ORACLE:
Table A is on a DB-server
Table B is on a different server
Table A will be populated with data.
Whenever something is inserted to Table A, i want to copy it to Table B.
Table B nearly has similar columns, but sometimes I just want to get
the content from 2 columns from tableA and concatenate it and save it to
Table B.
I am not very familiar with ORACLE, but after researching on GOOGLE
some say that you can do it with TRIGGERS or VIEWS, how would you do it?
So in general, there is a table which will be populated and its content
should be copien to a different table.
This is the solution I came up so far
create public database link
other_db
connect to
user
identified by
pw
using 'tns-entry';
CREATE TRIGGER modify_remote_my_table
AFTER INSERT ON my_table
BEGIN INSERT INTO ....?
END;
/
How can I select the latest row that was inserted?
If the databases of these two tables are in two different servers, then you will need a database link (db-link) to be created in Table A schema so that it can access(read/write) the Table B data using db-link.
Step 1: Create a database link in Table A server db pointing to Table B server DB
Step 2: Create a trigger for Table A, which helps in inserting data to the table B using database link. You can customize ( concatenate the values) inside the trigger before inserting it into table B.
This link should help you
http://searchoracle.techtarget.com/tip/How-to-create-a-database-link-in-Oracle
Yes you can do this with triggers. But there may be a few disadvantages.
What if database B is not available? -> Exception handling in you trigger.
What if database B was not available for 2h? You inserted data into database A which is now missing in database B. -> Do crazy things with temporarily inserting it into a cache table in database A.
Performance. Well, the performance for inserting a lot of data will be ugly. Each time you insert data, Oracle will start the PL/SQL engine to insert the data into the remote database.
Maybe you could think about using MViews (Materialized Views) to replicate the data via database link. Later you can build your queries so that they access tables from database B and add the required data from database A by joining the MViews.
You can also use fast refresh to replicate the data (almost) realtime.
From perspective of an Oracle Database Admin this would make a lot more sense than the trigger approach.
try this code
database links are considered rather insecure and oracle own options are having licences associated these days, some of the other options are deprecated as well.
https://gist.github.com/anonymous/e3051239ba401e416565cdd912e0de8c
uses ora_rowscn to sync tables across two different oracle databases.
Suppose I want to perform an UPDATE of the object stored as a row in SQL Server 2012, and some of the column values are different, but most of them are the same. Question is simple - will Sql Server perform hard disk write only of the changed columns, or it will write the entire row, uselessly rewriting existing column values with just the same? And what if the object is kept in DB spread over many tables (complex object)?
I should I rephrase my question - Is perfomance hit of doing non updating UPDATEs of some columns when updating the row worth even worrying it?
BTW I use id column as clustered key.
Background: My team has an etl job that updates an aggregate table. Each row contains data for a particular date, but this row can and will get updated after the row date (which means any row can contain data from multiple jobs). This ETL job missed some data for one day last week and now I need to backfill it.
Problem: I have the missing data, and what I was planning on doing was dumping that data into a temporary table and then merging it with the agg table. That way I can deal with whether the ETL job already contains a row for that data (update) or whether a new row needs to be added (insert), but I don't have sufficient permissions to create a temp table, and I'd prefer not to involve the DBA.
Question: Can I do an insert/update sort of behavior without creating a temporary table (this is Oracle SQL by the way).
Edit: The data is coming from a tsv file.
Why do you want to avoid involving the DBA? The DBA should have full knowledge of what's going on in the database, as they are ultimately responsible for the condition of the data within it. So you shouldn't be playing sneaky commando with them.
As you have a file of missing data, the easiest way to present it to the database is with an external table. This requires the creation of the table and probably a directory object as well. You will need the DBA's help with this task.
The only way to avoid creating database objects is to convert your TSV file into a series of DML statements. An IDE which supports regex and/or records macros will prove invaluable here. I like TextPad; other editors are available.
The DML statement for doing upserts in Oracle is the MERGE statement. The one thing you need to watch for is recency. Your missing data comes from last week. If a row exists it may have have been added or amended in the intervening period. You must write your MERGE statement so it does not overwrite more recent data with the older stuff. Hopefully your table has useful metadata columns such as DATE_CREATED and LAST_UPDATED.
I have two databases with identical table layouts. There are a dozen or so tables of interest. They are a number of FK between them.
I have been asked to write a stored procedure to copy data from database A to database B based on the PK of the parent table at the top of the hierarchy. I may receive just one value, or a list of values. I'm supposed to select all records from database A that match the value(s) and insert/update them into database B. This includes all the records in the child tables too.
My questions is whats the best(most efficent/ best practice) way to do this?
Should I write a dozen select from... insert into... statements?
Should I join the tables together an try to insert into all the tables at the same time?
Thanks!
Additional info:
The record should be inserted if it is not already there. (based on the PK of the respective table). Otherwise it should be updated.
Obviously I need to traverse down to all child tables, so There would only be one record to copy in the parent table, but the child table might have 10, and the child's child table might have 500. I would of course need to update the record if it already existed, insert if it does not for the child tables too...
UPDATE:
I think it would make the solution simpler if I just deleted all records related to the top level key, and then insert all the new records rather than trying to do updates.
So I guess the questions is it best to just do a dozen:
delete from ... where ... in ...
select from ... where ... in ...
insert into...
or is it better to do some kinda of fancy joins to do all the inserts in one sql statement?
I would do this by disabling all the foreign key constraints, then doing a set of MERGE statements to deal with the updates and inserts, then enable all the constraints.
Think about logging. How much redo do you want to generate?
You might find that it's quicker and better to truncate all the target tables and then do inserts of everything with nolog. Could be simpler than the merges.
One major main alternative would be to drop all the target tables and use export and import. Might be a lot faster.
A second alternative would be to use materialized views, particularly if you don't need to do updates on the target tables. That way, Oracle does all the heavy lifting for you. You can force integrity by choosing refresh groups carefully.
There are several ways to deal with this business problem. A PL/SQL program may not be the best.