SQL Loader, Trigger saturation? [closed] - performance

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a situation i can't find an explanation for, here it is. (I'll use hypothetical info since the original are really big.)
I have a table, let's say:
table_a
-------------
name
last name
dept
status
notes
And this table has a trigger on insert, which does a lot of validation to the info to change the status field of the new record according to the results of the validation, some of the validations are:
- check for the name existing in a dictionary
- check for the last name existing in a dictionary
- check that fields (name,last name,dept) aren't already inserted in table_b
- ... and so on
The thing is, if I do an insert on the table via query, like
insert into table_a
(name,last_name,dept,status,notes)
values
('john','smith',1,0,'new');
it takes only 173 ms to do all the validation process, update the status field and insert the record in the table. (the validation process does all the searches via indexes)
But if I try this via SQLloader, reading a file with 5000 records, it takes like 40 minutes to validate and insert 149 records (of course i killed it...)
So I tried loading the data disabling the trigger (to check speed)
and I got that it loads like all the records in less than 10 seconds.
So my question is, what can I do to improve this process? My only theory is that I could be saturating the database because it loads so fast and launches many instances of the trigger, but I really don't know.
My objective is to load around 60 files with info and validate them through the process in the trigger (willing to try other options though).
I would really appreciatte any help you can provide!!
COMPLEMENT---------------------------------------------------------------------------------
Thanks for the answer, now i'll read all about this, now hope you can help me with this part. let me explain some of the functionality i need (and i used a trigger cause i couldn't think of anything else)
so the table data comes with this (important) fields:
pid name lastname birthdate dept cicle notes
the data comes like this
name lastname birthdate dept
now, the trigger does this to the data:
Calls a function to calculate the pid (is calculated based on the name, lastname and birthdate with an algorithm)
Calls a function to check for the names on the dictionary (thats because in my dictionary i have single names, meaning if a person is named john aaron smith jones the function splits john aaron in two, and searches for john and aaron in the dictionary in separate querys, thats why i didn't use a foreign key [to avoid having lots of combinations john aaron,john alan,john pierce..etc]) --->kinda stuck on how to implement this one with keys without changing the dictionary...maybe with a CHECK?, the lastname foreign key would be good idea.
Gets the cicle from another table according to the dept and the current date (because a person can appear twice in the table in the same dept but in different cicle) --->how could i get this cicle value in a more efficient way to do the correct search?
And finally, after all this validation is done, i need to know exactly which validation wasn't met (thus the field notes) so the trigger concatenates all the strings of failed validations, like this:
lastname not in dictionary, cannot calculate pid (invalid date), name not in dictionary
i know that if the constraint check isn't met all i could do is insert the record in another table with the constraint-failed error message, but that only leaves me with one validation, am i right? but i need to validate all of them and send the report to other department so they can review the data and make all the necessary adjustments to it.
Anyway, this is my situation right now, i'll explore possibilities and hope you can share some light on the overall process, Thank you very much for your time.

You're halfway to the solution already:
"So I tried loading the data disabling the trigger (to check speed) ... it loads like all the records in less than 10 seconds."
This is not a surprise. Your current implementation executes a lot of single row SELECT statements for each row you insert into table B. That will inevitably give you a poor performance profile. SQL is a set-based language and performs better with multi-row operations.
So, what you need to do is find a way to replace all the SELECT statements which more efficient alternatives. Then you'll be able to drop the triggers permanently. For instance, replace the look-ups on the dictionary with foreign keys between the table A columns and the reference table. Relational integrity constraints, being internal Oracle code, perform much better than any code we can write (and work in multi-user environments too).
The rule about not inserting into table A if a combination of columns already exists in table B is more problematic. Not because it's hard to do but because it sounds like poor relational design. If you don't want to load records in table A when they already exits in table B why aren't you loading into table B directly? Or perhaps you have a sub-set of columns which should be extracted from table A and table B and formed into table C (which would have foreign key relationships with A and B)?
Anyway, leaving that to one side, you can do this with set-based SQL by replacing SQL*Loader with an external table. An external table allows us to present a CSV file to the database as if it were a regular table. This means we can use it in normal SQL statements. Find out more.
So, with foreign key constraints on dictionary and an external table you can replace teh SQL Loader code with this statement (subject to whatever other rules are subsumed into "...and so on"):
insert into table_a
select ext.*
from external_table ext
left outer join table_b b
on (ext.name = b.name and ext.last_name = b.last_name and ext.dept=b.dept)
where b.name is null
log errors into err_table_a ('load_fail') ;
This employs the DML error logging syntax to capture constraint errors for all rows in a set-based fashion. Find out more. It won't raise exceptions for rows which already exist in table B. You could either use the multi-table INSERT ALL to route rows into an overflow table or use a MINUS set operation after the event to find rows in the external table which aren't in table A. Depends on your end goal and how you need to report things.
Perhaps a more complex answer than you were expecting. Oracle SQL is a very extensive SQL implementation, with a lot of functionality for improving the efficient of bulk operations. It really pays us to read the Concepts Guide and the SQL Reference to find out just how much we can do with Oracle.

Related

Oracle 12c - refreshing the data in my tables based on the data from warehouse tables

I need to update the some tables in my application from some other warehouse tables which would be updating weekly or biweekly. I should update my tables based on those. And these are having foreign keys in another tables. So I cannot just truncate the table and reinsert the whole data every time. So I have to take the delta and update accordingly based on few primary key columns which doesn't change. Need some inputs on how to implement this approach.
My approach:
Check the last updated time of those tables, views.
If it is most recent then compare each row based on the primary key in my table and warehouse table.
update each column if it is different.
Do nothing if there is no change in columns.
insert if there is a new record.
My Question:
How do I implement this? Writing a PL/SQL code is it a good and efficient way? as the expected number of records are around 800K.
Please provide any sample code or links.
I would go for Pl/Sql and bulk collect forall method. You can use minus in your cursor in order to reduce data size and calculating difference.
You can check this site for more information about bulk collect, forall and engines: http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html
There are many parts to your question above and I will answer as best I can:
While it is possible to disable referencing foreign keys, truncate the table, repopulate the table with the updated data then reenable the foreign keys, given your requirements described above I don't believe truncating the table each time to be optimal
Yes, in principle PL/SQL is a good way to achieve what you are wanting to
achieve as this is too complex to deal with in native SQL and PL/SQL is an efficient alternative
Conceptually, the approach I would take is something like as follows:
Initial set up:
create a sequence called activity_seq
Add an "activity_id" column of type number to your source tables with a unique constraint
Add a trigger to the source table/s setting activity_id = activity_seq.nextval for each insert / update of a table row
create some kind of master table to hold the "last processed activity id" value
Then bi/weekly:
retrieve the value of "last processed activity id" from the master
table
select all rows in the source table/s having activity_id value > "last processed activity id" value
iterate through the selected source rows and update the target if a match is found based on whatever your match criterion is, or if
no match is found then insert a new row into the target (I assume
there is no delete as you do not mention it)
on completion, update the master table "last processed activity id" to the greatest value of activity_id for the source rows
processed in step 3 above.
(please note that, depending on your environment and the number of rows processed, the above process may need to be split and repeated over a number of transactions)
I hope this proves helpful

Sybase ASE remote row insert locking

Im working on an application which access a Sybase ASE 15.0.2 ,where the current code access a remote database
(CIS) to insert a row using a proxy table definition (the destination table is a DOL - DRL table - The PK
row is defined as identity ,and is always growing). The current code performs a select to check if the row
already exists to avoid duplicate data to be inserted.
Since the remote table also have a PK definition on the table, i do understand that the PK verification will
be done again prior to commiting the row.
Im planning to remove the select check since its being effectively done again by the PK verification,
but im concerned about if when receiving a file with many duplicates, the table may suffer
some unecessary contention when the data is tried to be commited.
Its not clear to me if Sybase ASE tries to hold the last row and writes the data prior to check for the
duplicate. Also, if the table is very big, im concerned also about the time it will spend looking the
whole index to find duplicates.
I've found some documentation for SQL anywhere, but not ASE in the following link
http://dcx.sybase.com/1200/en/dbusage/insert-how-transact.html
The best i could find is the following explanation
https://groups.google.com/forum/?fromgroups#!topic/comp.databases.sybase/tHnOqptD7X8
But it doesn't enlighten in details how the row is locked (and if there is any kind of
optimization to write it ahead or at the same time of PK checking)
, and also if it will waste a full PK look if im positively inserting a row which the PK
positively greater than the last row commited
Thanks
Alex
Unlike SqlAnywhere there is no option for ASE to set wait_for_commit. The primary key constraint is checked during the insert and not at the commit time. The problem as I understand from your post I see is if you have a mass insert from a file that may contain duplicates is to load into a temp table , check for duplicates, remove the duplicates and then insert the unique ones. Mass insert are lot faster though it still checks for primary key violations. However there is no cost associated as there is no rolling back. The insert statement is always all or nothing. Even if one row is duplicate the entire insert statement will fail. Check before insert in more of error free approach as opposed to use of constraint to the verification because it is going to fail and rollback is going to again be costly.
Thanks Mike
The link does have a very quick explanation about the insert from the CIS perspective. Its a variable to keep an eye on given that CIS may become a representative time consumer
if its performing data and syntax checking if it will be done again when CIS forwards the insert statement to the target server. I was afraid that CIS could have some influence beyond the network traffic/time over the locking/PK checking
Raju
I do agree that avoiding the PK duplication by checking if the row already exists by running a select and doing in a batch, but im currently looking for a stop gap solution, and that may be to perform the insert command in batches of about 50 rows and leave the
duplicate key check for the PK.
Hopefully the PK check will be done over a join of the 50 newly inserted rows, and thus
avoid to traverse the index for each single row...
Ill try to test this and comment back
Alex

Inserting data in a column avoiding duplicates

Lets say i have a query which is fetching col1 after joining multiple tables. I want to insert values of that col1 in a table which is on remote db i.e. i would be using dblink to do that.
Now that col1 would be fetched from 4-5 different db's. There is chances that a value1 fetch from db1 would b in db2 as well. How can i avoid duplicates ?
In my remote db, I have created col1 a primary key. so when inserting, an error would be thrown if there is a duplicate key, end result failing rest of the process. Which i don't want to. I was thiking about 2 approaches
Write a PLSQL script, For each value, determine if value already exists or not. If it doesn't then insert.
Write a PLSQL script and insert and catch the duplicate key exception. The exception would be ignore and it will keep inserting (it doesn't sound that good).
Which approach would you prefer? Is there anything else i can do ?
I would use the MERGE statement and WHEN NOT MATCHED THEN INSERT.
The same merger could also update but it doesn't have to, just leave the update part out.
The different databases can have duplicate primary keys but that doesn't mean the records are duplicates. The actual data may be different in each case. Or the records may represent the same real world thing but at different statuses, Don't know, you haven't provided enough explanation.
The point is, you need much more analysis of why duplicate records can exist and probably a more sophisticated approach to handling collisions. Do you need to take all records (in which case you need a synthetic key)? Or do you take only one instance (so how do you decide precedence)? Other scenarios may exist.
In any case, MERGE or PL/SQL loops are likely to be too crude a solution.
First off, I would suggest that your target database drive all of these inserts because inserting/updating across a database link can create some locking issues and further complicate things especially with multiple databases attempting to access and perform DML on the same table. However if that isn't possible the solutions below will work.
I would fix your primary key problem by including a table look-up on the target table for each row.
INSERT INTO customer#dblink.oracle.com cust
(emp_name,
emp_id)
VALUES
(SELECT
cust.employee_name,
cust.employee_id --primary_key
FROM
source_table st
WHERE NOT EXISTS
(SELECT 1
FROM customer#dblink.oracle.com cust
WHERE cust.employee_id = st.emp_id));
Again, I would not recommend DML transactions across database links unless absolutely necessary as you can sometimes have weird locking behavior.
A PL/SQL procedure or anonymous PL/SQL block could be used to create a bulk processing solution as follows:
CREATE OR REPLACE PROCEDURE send_unique_data
AS
TYPE tab_cust IS TABLE OF customer#dblink.oracle.com%ROWTYPE
INDEX BY PLS_INTEGER;
t_records tab_cust;
BEGIN
SELECT
cust.employee_name,
cust.employee_id --primary_key
BULK COLLECT
INTO t_records
FROM source_table;
FORALL i IN t_records.FIRST...t_records.LAST SAVE EXCEPTIONS
INSERT INTO customer#dblink.oracle.com
VALUES t_records(i);
END send_unique_data;
You can also call the system SQL%BULKEXCEPTIONS collection in case you want to do anything with the records that threw exceptions (such as unique_constraint violations). Be warned that this solution will cause the target table to suffers from performance issues if there are lots of duplicate data attempting to be inserted.

Oracle, 2 procedures avoid deadlock

I have two procedures that I want to run on the same table, one uses the birth date and the other updates the name and last name taken from a third table.
The one that uses the birthday to update the age field runs all over the table, and the one that updates the names and last name only updates the rows that appear on the third table based on a key.
So I launched both and got deadlocked! Is there a way to prioritize any of them? I read about the nowait and skip locked for the update but then, how would I return to the ones skipped?
Hope you can help me on this!!
One possibility is to lock all rows you will update at once. Doing all updates in a single update statment will accomplish this. Or
select whatever from T
where ...
for update;
Another solution is to create what I call a "Gatekeeper" table. Both procedures need to lock the Gatekeeper table in exclusive mode before updating the table in question. The second procedure will block until the first commits but won't deadlock. In 11g you can create a table with no space allocated.
A variation is to insert a row in the Gatekeeper. Then lock only that row with select for update. Then you can use the Gatekeeper in other situations.
I would guess that you got locked because the update for all the rows and the update for a small set of rows accessed rows in different orders.
The former used a full scan and reached Raw A first, then went on to other rows, eventually trying to lock Row B. However, the other query was driven from an index or a join and already had Row B locked, and was off to lock Row A when it found it was already locked.
So, the fix: firstly, having an age column that needs to be constantly modified is a really bad idea. Perhaps it was done to allow indexing of age, but with a correctly written query an index on date of birth will let you find the same records just as quickly. You've broken normalisation rules and ended up coding yourself a deadlocking application. Hopefully you are only updating the rows that need to be updated, not all of them regardless -- I mean, that would just be insane.
The best solution is to get rid of that design flaw.
The not so good solution is to deconflict your queries by running them at different times or by using DBMS_Lock so that only one of them can run at any time.

oracle - moving data from to identical database

I have two databases with identical table layouts. There are a dozen or so tables of interest. They are a number of FK between them.
I have been asked to write a stored procedure to copy data from database A to database B based on the PK of the parent table at the top of the hierarchy. I may receive just one value, or a list of values. I'm supposed to select all records from database A that match the value(s) and insert/update them into database B. This includes all the records in the child tables too.
My questions is whats the best(most efficent/ best practice) way to do this?
Should I write a dozen select from... insert into... statements?
Should I join the tables together an try to insert into all the tables at the same time?
Thanks!
Additional info:
The record should be inserted if it is not already there. (based on the PK of the respective table). Otherwise it should be updated.
Obviously I need to traverse down to all child tables, so There would only be one record to copy in the parent table, but the child table might have 10, and the child's child table might have 500. I would of course need to update the record if it already existed, insert if it does not for the child tables too...
UPDATE:
I think it would make the solution simpler if I just deleted all records related to the top level key, and then insert all the new records rather than trying to do updates.
So I guess the questions is it best to just do a dozen:
delete from ... where ... in ...
select from ... where ... in ...
insert into...
or is it better to do some kinda of fancy joins to do all the inserts in one sql statement?
I would do this by disabling all the foreign key constraints, then doing a set of MERGE statements to deal with the updates and inserts, then enable all the constraints.
Think about logging. How much redo do you want to generate?
You might find that it's quicker and better to truncate all the target tables and then do inserts of everything with nolog. Could be simpler than the merges.
One major main alternative would be to drop all the target tables and use export and import. Might be a lot faster.
A second alternative would be to use materialized views, particularly if you don't need to do updates on the target tables. That way, Oracle does all the heavy lifting for you. You can force integrity by choosing refresh groups carefully.
There are several ways to deal with this business problem. A PL/SQL program may not be the best.

Resources