I have written a plsql block to delete some records for a bunch of tables.
so to identify the records to be deleted I have created a cursor on top that query.
declare
type t_guid_invoice is ref cursor;
c_invoice t_guid_invoice;
begin
open c_invoice for
select * from a,b where a.col=b.col ;--(quite a complex join,renders 200k records)
loop fetch c_invoice into col1,col2,col3;
exit when c_invoice%NOTFOUND;
begin
DELETE
FROM tab2
WHERE cola= col1;
if SQL%rowcount > 0 then
dbms_output.put_line ( 'INFO: tab2 for ' || col1|| '/' || col2|| ' removed.');
else
dbms_output.put_line ( 'WARN: No tab2 for ' || col1|| '/' || col2|| ' found!');
end if;
eXception
when others then
dbms_output.put_line ( 'ERR: Problems while deleting tab2 for ' || col1|| '/' || col2 );
dbms_output.put_line ( SQLERRM );
end;
....
end loop;
This continues to loop through about 26 tables, there are some tables which are as big as 60 million records.
deletion is based on primary key in each table. All triggers are disabled before deletion process.
if I try to delete 10k records, it loops through 10k times, deleting multiple rows in each table but its taking as long as 30 minutes. There is no commit after each block, since I have to cater to simulation mode too.
Any suggestions to speed up the process? Thanks!
For sure, if you loop 10k times, all those DBMS_OUTPUT.PUT_LINE calls will slow things down (even if you aren't doing anything "smart") (and I wonder whether buffer is large enough). If you want to log what's going on, create a log table and an autonomous transaction procedure which will insert that info (and commit).
Apart from that, are tables properly indexed? E.g. that would be cola column in tab2 table (in code you posted). Did you collect statistics on tables and indexes? Probably wouldn't harm if you do that for the whole schema.
Did you check explain plan?
Do you know what takes the most time? Is it a ref cursor query (so it has to be optimized), or is it deleting itself?
Can't you avoid loop entirely? Row-by-row processing is slow. For example, instead of using ref cursor, create a table out of it, index it, and use it as
create table c_invoice as
select * from a join b on a.col = b.col;
create index i1inv_col1 on c_invoice (col1);
delete from tab2 t
where exists (select null
from c_invoice c
where c.col1 = t.cola
);
You typically never want to delete a large number of rows from a table in a loop.
You want to use one DELETE statement with an appropriate WHERE condition.
Additionaly while processing a large number of rows you typically do not want to use an index.
So your first step would be to check the rows that would not be deleted (your warnings)
You get the keys with the following query and you may log them
select a.col from a,b where a.col=b.col
minus
select cola from tab2;
In the second step you delete all rows with one statement.
delete
from tab2
where cola in (select a.col from a,b where a.col=b.col);
In problems check the execution plan, you expect TABLE ACCESS FULL (INDEX FAST FULL SCAN is fine as well) on all sources combined with a HASH JOIN.
Related
Welcome Oracle pro's
In an Oracle 12 database (upgrade is already scheduled ;-)) we have a setup of different tables updating a common base table via "after update" triggers like following:
Search_Flat
ID
Field_A
Field_B
Field_C
Now table1 contains n columns where let's say 2 out of n are relevant for the Search_Flat table. As the update of table1 may only affect columns not relevant for Seach_Flat we want to add checks to the trigger. So our first approach is like following:
CREATE OR REPLACE TRIGGER tr_tbl_1_au_search
AFTER UPDATE OF
field_a,
field_b
ON schemauser.search_flat
FOR EACH ROW
BEGIN
IF :new.field_a <> :old.field_a THEN
UPDATE schemauser.search_flat SET field_a = :new.field_a WHERE id = :new.ID;
END IF;
IF :new.field_b <> :old.field_b THEN
UPDATE schemauser.search_flat SET field_b = :new.field_b WHERE id = :new.ID;
END IF;
END;
Alternatively we could also setup the trigger like following:
CREATE OR REPLACE TRIGGER tr_tbl_1_au_search
AFTER UPDATE OF
field_a,
field_b
ON schemauser.search_flat
FOR EACH ROW
BEGIN
IF :new.field_a <> :old.field_a OR :new.field_b <> :old.field_b THEN
UPDATE schemauser.search_flat
SET field_a = :new.field_a,
field_b = :new.field_b
WHERE id = :new.ID;
END IF;
END;
The question now is about the setup of the triggers themselves. Which approach is the better with respect to:
locking time of search_flat rows
overall performance of affected components (i.e., table_1, trigger and search_flat)
In production we are talking about 4 tables with 10 fields each considered in the triggers. And we have independent app servers accessing the shared database updating the 4 tables simultaneously. From time to time we detect the following error which is the reason we wan't to optimize the triggers:
ORA-02049: timeout: distributed transaction waiting for lock
Sidenote: This setup has been chosen instead of a view or materialized view due to performance reasons as the base table is used in gui with the requirement to be instantly updated and the number of records of the 4 feeding tables are too high for updating materialized view on update.
I'm looking forward to the discussion and your thoughts.
As I understand your post, you have 4 live tables (called "table1", "table2", etc.) that you want to search on, but querying from them is too slow, so you want to maintain a single, flattened table to search on instead and have triggers to keep that flattened table always up-to-date.
You want to know which of two trigger approaches is better.
I think the answer is "neither", since both are prone to deadlocks. Imagine this scenario
User 1 -
UPDATE table1
SET field_a = 500
WHERE <condition effecting 200 distinct IDs>
User 2 at about the same time -
UPDATE table1
SET field_b = 700
WHERE <condition effecting 200 distinct IDs>
Triggers start processing. You cannot control the order in which the rows are updated. Maybe it goes like this:
User 1's trigger, time index 100 ->
UPDATE search_flat SET field_a = 500 WHERE id = 90;
User 2's trigger, time index 101 ->
UPDATE search_flat SET field_b = 700 WHERE id = 91;
User 1's trigger, time index 102 ->
UPDATE search_flat SET field_a = 500 WHERE id = 91; (waits on user 2's session)
User 2's trigger, time index 103 ->
UPDATE search_flat SET field_b = 700 WHERE id = 90; (deadlock error)
User 2's original update fails and rolls back.
You have multiple concurrent processes all updating the same set of rows in search_flat with no control over the processing order. That is a recipe for deadlocks.
If you wanted to do this safely, you should consider neither of the FOR EACH ROW trigger approaches you outlines. Rather, make a compound trigger to do this.
Here's some sample code to illustrate the idea. Be sure to read the comments.
-- Aside: consider setting this at the system level if on 12.2 or later
-- alter system set temp_undo_enabled=false;
CREATE GLOBAL TEMPORARY TABLE table1_updates_gtt (
id NUMBER,
field_a VARCHAR2(80),
field_b VARCHAR2(80)
) ON COMMIT DELETE ROWS;
CREATE GLOBAL TEMPORARY TABLE table2_updates_gtt (
id NUMBER,
field_a VARCHAR2(80)
) ON COMMIT DELETE ROWS;
-- .. so on for table3 and 4.
CREATE OR REPLACE TRIGGER table1_search_maint_trg
FOR INSERT OR UPDATE OR DELETE ON table1 -- with similar compound triggers for table2, 3, 4.
COMPOUND TRIGGER
AFTER EACH ROW IS
BEGIN
-- Update the table-1 specific GTT with the changes.
CASE WHEN INSERTING OR UPDATING THEN
-- Assumes ID is immutable primary key
INSERT INTO table1_updates_gtt (id, field_a) VALUES (:new.id, :new.field_a);
WHEN DELETING THEN
INSERT INTO table1_updates_gtt (id, field_a) VALUES (:old.id, null); -- or figure out what you want to do about deletes.
END CASE;
END AFTER EACH ROW;
AFTER STATEMENT IS
BEGIN
-- Write the data from the GTT to the search_flat table.
-- NOTE: The ORDER BY in the next line is what saves us from deadlocks.
FOR r IN ( SELECT id, field_a, field_b FROM table1_updates_gtt ORDER BY id ) LOOP
-- TODO: replace with BULK processing for better performance, if DMLs can affect a lot of rows
UPDATE search_flat sf
SET sf.field_a = r.field_a,
sf.field_b = r.field_b
WHERE sf.id = r.id
AND ( sf.field_a <> r.field_a
OR (sf.field_a IS NULL AND r.field_a IS NOT NULL)
OR (sf.field_a IS NOT NULL AND r.field_a IS NULL)
OR sf.field_b <> r.field_b
OR (sf.field_b IS NULL AND r.field_b IS NOT NULL)
OR (sf.field_b IS NOT NULL AND r.field_b IS NULL)
);
END LOOP;
END AFTER STATEMENT;
END table1_search_maint_trg;
Also, as numerous commenters have pointed out, it's probably better to use a materialized view for this. If you are on 12.2 or later, real-time materialized views (aka "ENABLE ON QUERY COMPUTATION") offer a lot of promise for this sort of thing. No COMMIT overhead to your application and real-time search results. It's just that search time degrades slightly if there are a lot of recent updates to the underlying tables.
I need to perform delete operation(with oldest data first) on a de-normalized table which does not have any unique ID. We are considering date as parameter
while deciding the oldest data. But for each date their would be 600K records.And as table is very large the delete opertaion will be done in batches(Loop with rownum)
I used below query but this one is throwing error as 'data manipulation operation not legal on this view'. But I am sure that temp is object type of table.
delete from (Select A.*,rownum as rn from (select * from temp order by date) A) B where B.rn<10
Please let me know any alternative suggestions.
Thanks Kifinity your solution worked but now i am confused how to make it work to delete huge number of records.. I have around 10 millions of records and need to delete them in batches. Thanks in advance.
In your statement, everything after delete from is an inline view. You can't delete from an inline view; your statement has to be delete from TABLE, e.g.
delete from temp
where rowid in (select rowid as rid
from (select * from temp order by date)
where rownum < 10)
Edit: as requested, here's an example as part of a PL/SQL loop.
begin
loop
delete from temp
where rowid in (select rowid as rid
from (select * from temp order by date)
where rownum < 1000);
if SQL%ROWCOUNT = 0 then exit; end if;
end loop;
end;
/
I have a PL/SQL package in Oracle that its important function is :
function checkDuplicate(in_id in varchar2) return boolean is
cnt number;
begin
select count(*)
into cnt
from tbl_Log t
where t.id = in_id
if (cnt > 0) then
// It means the request is duplicate on in_id
return false;
end if;
insert into tbl_log (id,date) values(in_id , sysdate);
return true;
end;
When two requests call this function concurrently, both of them passed this function and two the same in_id inserted in tbl_log.
Note: tbl_log doesn't have a PK for performance issues.
Are there any solutions?
" both of them passed this function and two the same in_id inserted in tbl_log"
Oracle operates at the READ COMMITTED isolation level, so the select can only find committed records. If one thread has inserted a record for a given value but hasn't committed the transaction another thread looking for the same value will come up empty.
"Note: tbl_log doesn't have a PK for performance issues. "
The lessons of history are clear: tables without integrity constraints inevitably fall into data corruption.
"I want to recognize the duplication with this function ... Are there any solutions?"
You mean apart from adding a primary key constraint? There is no more efficient way of trapping duplication than a primary key. Maybe you should look at the performance issues. Plenty of applications mange to handle millions of inserts and still enforce integrity constraints. You should also look at the Java layer: why have you got multiple threads submitting the same ID?
Note: tbl_log doesn't have a PK for performance issues.
There is no PK nor unique index on this column in order to "avoid performance issues", but there are hundreds or thousands queries like SELECT ... WHERE t.id = .. running against this table. These queries must use a full table scan due to lack of index on this column !!!!
This can cause much bigger performance issues in my opinion.
Since the values of this columns are UUIDs, then there is a very little chance of conflicted values. In this case I would prefer not to use any locks.
Just use an unique constraint (index) on this column to prevent from inserting two duplicate values.
ALTER TABLE tbl_log ADD CONSTRAINT tbl_log_id_must_be_unique UNIQUE( id );
and then use this implementation of your function:
create or replace function checkDuplicate(in_id in varchar2) return boolean is
begin
insert into tbl_log (id,"DATE") values(in_id , sysdate);
return true;
exception when dup_val_on_index then
return false;
end;
/
In the vast majority of cases the function simply inserts a new record to the table without any delay because values are UUIDs.
In seldom cases of duplicated values, when the value is already commited in the table, the insert will immediatelly fail, without any delay.
In very very rare cases (almost impossible) when two threads are trying to simultanously insert the same UUID, the second thread will be held on INSERT command and will wait some time until the first thread will commit or rollback.
As per your condition, since you are reluctant to use Primary key data integrity enforcement( which will lead to data corruption anyhow ), i would suggest that you can use MERGE statment and keep an audit log for the latest thread updating the table. This way you will be able to eliminate the entry of duplicate record as well as keep a track of when and from which thread (latest info) the id got updated. Hope the below snippet helps.
---Create dummy table for data with duplicates
DROP TABLE dummy_hist;
CREATE TABLE dummy_hist AS
SELECT LEVEL COL1,
'AVRAJIT'
||LEVEL COL2,
SYSTIMESTAMP ACTUAL_INSERTION_DT,
SYSTIMESTAMP UPD_DT,
1 thread_val
FROM DUAL
CONNECT BY LEVEL < 100;
--Update upd_dt
UPDATE dummy_hist SET upd_dt = NULL,thread_val = NULL;
SELECT * FROM dummy_hist;
--Create function
CREATE OR REPLACE
FUNCTION checkDuplicate(
in_id IN VARCHAR2,
p_thread_val IN NUMBER)
RETURN BOOLEAN
IS
cnt NUMBER;
BEGIN
MERGE INTO dummy_hist A USING
(SELECT in_id VAL FROM dual
)B ON (A.COL1 = B.VAL)
WHEN MATCHED THEN
UPDATE
SET a.upd_dt = systimestamp,
a.thread_val = p_thread_val
WHERE a.col1 = b.val WHEN NOT MATCHED THEN
INSERT
(
a.col1,
a.col2,
a.actual_insertion_dt,
a.UPD_DT,
a.thread_val
)
VALUES
(
b.val,
'AVRAJIT',
SYSTIMESTAMP,
NULL,
p_thread_val
);
COMMIT;
RETURN true;
END;
/
--Execute the fucntion
DECLARE
rc BOOLEAN;
BEGIN
FOR I IN
(SELECT LEVEL LVL FROM DUAL CONNECT BY LEVEL BETWEEN 8 AND 50
)
LOOP
rc:=checkduplicate(I.LVL,3);
END LOOP;
END;
/
I'm trying to use a nested table inside the IN clause in a PL-SQL block.
First, I have defined a TYPE:
CREATE OR REPLACE TYPE VARCHAR_ARRAY AS TABLE OF VARCHAR2(32767);
Here is my PL-SQL block using the 'BULK COLLECT INTO':
DECLARE
COL1 VARCHAR2(50) := '123456789';
N_TBL VARCHAR_ARRAY := VARCHAR_ARRAY();
C NUMBER;
BEGIN
-- Print timestamp
DBMS_OUTPUT.PUT_LINE('START: ' || TO_CHAR(SYSTIMESTAMP ,'dd-mm-yyyy hh24:mi:ss.FF'));
SELECT COLUMN1
BULK COLLECT INTO N_TBL
FROM MY_TABLE
WHERE COLUMN1 = COL1;
SELECT COUNT(COLUMN1)
INTO C
FROM MY_OTHER_TABLE
WHERE COLUMN1 IN (SELECT column_value FROM TABLE(N_TBL));
-- Print timestamp
DBMS_OUTPUT.PUT_LINE('ENDED: ' || TO_CHAR(SYSTIMESTAMP ,'dd-mm-yyyy hh24:mi:ss.FF'));
END;
And the output is:
START: 01-08-2014 12:36:14.997
ENDED: 01-08-2014 12:36:17.554
It takes more than 2.5 seconds (2.557 seconds exactly)
Now, If I replace the nested table by a subquery, like this:
DECLARE
COL1 VARCHAR2(50) := '123456789';
N_TBL VARCHAR_ARRAY := VARCHAR_ARRAY();
C NUMBER;
BEGIN
-- Print timestamp
DBMS_OUTPUT.PUT_LINE('START: ' || TO_CHAR(SYSTIMESTAMP ,'dd-mm-yyyy hh24:mi:ss.FF'));
SELECT COUNT(COLUMN1)
INTO C
FROM MY_OTHER_TABLE
WHERE COLUMN1 IN (
-- Nested table replaced by a subquery
SELECT COLUMN1
FROM MY_TABLE
WHERE COLUMN1 = COL1
);
-- Print timestamp
DBMS_OUTPUT.PUT_LINE('ENDED: ' || TO_CHAR(SYSTIMESTAMP ,'dd-mm-yyyy hh24:mi:ss.FF'));
END;
The output is:
START: 01-08-2014 12:36:08.889
ENDED: 01-08-2014 12:36:08.903
It takes only 14 milliseconds...!!!
What could I do to enhance this PL-SQL block ?
Is there any database configuration needed?
Are the two query plans different?
Assuming that they are, the difference is likely that the optimizer has reasonable estimates about the number of rows the subquery will return and, thus, is able to choose the most efficient plan. When your data is in a nested table (I'd hate to use the word array in the type declaration here since that implies that you're using a varray when you're not), Oracle doesn't have information about how many elements are going to be in the collection. By default, it's going to guess that the collection has as many elements as your data blocks have bytes. So if you have 8k blocks, Oracle will guess that your collection has 8192 elements.
Assuming that your actual query doesn't return anywhere close to 8192 rows and that it actually returns many more or many fewer rows, you can potentially use the cardinality hint to let the optimizer make a more accurate guess. For example, if your query generally returns a few dozen rows, you probably want something like
SELECT COUNT(COLUMN1)
INTO C
FROM MY_OTHER_TABLE
WHERE COLUMN1 IN (SELECT /*+ cardinality(t 50) */ column_value
FROM TABLE(N_TBL) t);
The literal you put in the cardinality hint doesn't need to be particularly accurate, just close to general reality. If the number of rows is completely unknown the dynamic_sampling hint can help.
If you are using Oracle 11g, you may also benefit from cardinality feedback helping the optimizer learn to better estimate the number of elements in a collection.
In Oracle, I have a requirement where in I need to insert records from Source to Target and then update the PROCESSED_DATE field of source once the target has been updated.
1 way is to use cursors and loop row by row to achieve the same.
Is there any other way to do the same in an efficient way?
No need for a cursor. Assuming you want to transfer those rows that have not yet been transfered (identified by a NULL value in processed_date).
insert into target_table (col1, col2, col3)
select col1, col2, col3
from source_table
where processed_date is null;
update source_table
set processed_date = current_timestamp
where processed_date is null;
commit;
To avoid updating rows that were inserted during the runtime of the INSERT or between the INSERT and the update, start the transaction in serializable mode.
Before you run the INSERT, start the transaction using the following statement:
set transaction isolation level SERIALIZABLE;
For more details see the manual:
http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10005.htm#i2067247
http://docs.oracle.com/cd/E11882_01/server.112/e25789/consist.htm#BABCJIDI
A trigger should work. The target table can have a trigger that on update, updates the source table's column with the processed date.
My preferred solution in this sort of instance is to use a PL/SQL array along with batch DML, e.g.:
DECLARE
CURSOR c IS SELECT * FROM tSource;
TYPE tarrt IS TABLE OF c%ROWTYPE INDEX BY BINARY_INTEGER;
tarr tarrt;
BEGIN
OPEN c;
FETCH c BULK COLLECT INTO tarr;
CLOSE c;
FORALL i IN 1..tarr.COUNT
INSERT INTO tTarget VALUES tarr(i);
FORALL i IN 1..tarr.COUNT
UPDATE tSource SET processed_date = SYSDATE
WHERE tSource.id = tarr(i).id;
END;
The above code is an example only and makes some assumptions about the structure of your tables.
It first queries the source table, and will only insert and update those records - which means you don't need to worry about other sessions concurrently inserting more records into the source table while this is running.
It can also be easily changed to process the rows in batches (using the fetch LIMIT clause and a loop) rather than all-at-once like I have here.
Got another answer from some one else. Thought that solution seems much more reasonable than enabling isolation level as all my new records will have the PROCESSED_DATE as null (30 rows which inserted with in the time the records got inserted in Target table)
Also the PROCESSED_DATE = NULL rows can be updated only by using my job. No other user can update these records at any point of time.
declare
date_stamp date;
begin
select sysdate
into date_stamp
from dual;
update source set processed_date = date_stamp
where procedded_date is null;
Insert into target
select * from source
where processed_date = date_stamp;
commit;
end;
/
Let me know any further thoughts on this. Thanks a lot for all your help on this.