How to delete huge rows in oracle table using parallel sessions query quickly - oracle

I am using mentioned query to delete 250 million plus rows from my table and it is taking more time
I have tried with t_delete limit with up to 20000.
Still slow deletion happening.
Please suggest a few optimisations in the same code to done my job faster.
DECLARE
TYPE tt_delete IS TABLE OF ROWID; t_delete tt_delete;
CURSOR cIMAV IS SELECT ROWID FROM moc_attribute_value where id in (select
id from ORPHANS_MAV);
total Number:=0;
rcount Number:=0;
Stmt1 varchar2(2000);
Stmt2 varchar2(2000);
BEGIN
--- CREATE TABLE orphansInconsistenDelProgress (currentTable
VARCHAR(100), deletedCount INT, totalToDelete INT);
--- INSERT INTO orphansInconsistenDelProgress (currentTable,
deletedCount,totalToDelete) values ('',0,0);
Stmt1:='ALTER SESSION SET parallel_degree_policy = AUTO';
Stmt2:='ALTER SESSION FORCE PARALLEL DML';
EXECUTE IMMEDIATE Stmt1;
EXECUTE IMMEDIATE Stmt2;
--- ALTER SESSION SET parallel_degree_policy = AUTO;
--- ALTER SESSION FORCE PARALLEL DML;
COMMIT;
--- MOC_ATTRIBUTE_VALUE
SELECT count(*) INTO total FROM ORPHANS_MAV;
UPDATE orphansInconsistenDelProgress SET currentTable='ORPHANS_MAV',
totalToDelete=total;
rcount := 0;
OPEN cIMAV;
LOOP
FETCH cIMAV BULK COLLECT INTO t_delete LIMIT 2000;
EXIT WHEN t_delete.COUNT = 0;
FORALL i IN 1..t_delete.COUNT
DELETE moc_attribute_value WHERE ROWID = t_delete (i);
COMMIT;
rcount := rcount + 2000;
UPDATE orphansInconsistenDelProgress SET deletedCount=rcount;
END LOOP;
CLOSE cIMAV;
COMMIT;
END;
/

A single Oracle parallel query can simplify the code and improve performance.
declare
execute immediate 'alter session enable parallel dml';
delete /*+ parallel */
from moc_attribute_value
where id in (select id from ORPHANS_MAV);
update OrphansInconsistenDelProgress
set currentTable = 'ORPHANS_MAV',
totalToDelete = sql%rowcount;
commit;
end;
/
In general, we want to either let Oracle break the task into pieces or use our own custom chunking. The original code seems to be doing both - it reads the data in chunks, and then submits each chunk to be further divided into a parallel delete. That approach generates lots of tiny pieces, and Oracle likely wastes a lot of time on things like thread coordination.
Deleting a large number of rows is expensive because there's no way to avoid REDO and UNDO. You might want to look into using DDL options, such as truncating a partition, or dropping and recreating the table. (But be careful recreating objects, it's difficult to perfectly recreate complex objects. We tend to forget things like privileges and table options.)
Tuning parallelism and large jobs is complicated. It's important to use the best monitoring tools, to ensure that Oracle is requesting, allocating, and using the right number of parallel processes, and that the execution plan is correct. One strong advantage of using a single SQL statement is that you can use real-time SQL monitoring reports to monitor progress. If you have the Diagnostics and Tuning Pack licenses, find the SQL_ID in GV$SQL and generate the report with select dbms_sqltune.report_sql_monitor('your SQL_ID here');.

maybe use SQL TRUNCATE TABLE,
Truncate table is faster and uses lesser resources than DELETE TABLE command.

If you are keeping only a fraction of the rows, it is likely to be much faster to copy over the rows to keep, then swap tables and delete the old table.
(No I don't know the threshold at which this is faster than DELETEing.)

Related

how to run the stored procedure in batch mode or in run it in parallel processing

We are iterating 100k+ records from global temporary table.below stored procedure will iterate all records from glogal temp table one by one and has to process below three steps.
to see whether product is exists or not
to see whether product inside the assets are having the 'category' or not.
to see whether the assets are having file names starts with '%pdf%' or not.
So each record has to process these 3 steps and final document names will be stored in the table for the successful record. If any error comes in any of the steps then error message will be stored for that record.
Below stored procedure is taking long time to process Because its processing sequentially.
Is there any way to make this process faster in the stored procedure itself by doing batch process?
If it's not possible in stored procedure then can we change this code into Java and run this code in multi threaded mode? like creating 10 threads and each thread will take one record concurrently and process this code. I would be happy if somebody gives some pseudo code.
which approach is going to suggest?
DECLARE
V_NODE_ID VARCHAR2(20);
V_FILENAME VARCHAR2(100);
V_CATEGORY_COUNT INTEGER :=0;
FINAL_FILNAME VARCHAR2(2000);
V_FINAL_ERRORMESSAGE VARCHAR2(2000);
CURSOR C1 IS
SELECT isbn FROM GT_ADD_ISBNS GT;
CURSOR C2(v_isbn in varchar2) IS
SELECT ANP.NODE_ID NODE_ID
FROM
table1 ANP,
table2 ANPP,
table3 AN
WHERE
ANP.NODE_ID=AN.ID AND
ANPP.NODE_ID=ANP.NODE_ID AND
AN.NAME_ID =26 AND
ANP.CATEORGY='category' AND
ANP.QNAME_ID='categories' AND
ANP.NODE_ID IN(SELECT CHILD_NODE_ID
FROM TABLE_ASSOC START WITH PARENT_NODE_ID IN(v_isbn)
CONNECT BY PRIOR CHILD_NODE_ID = PARENT_NODE_ID);
BEGIN
--Iterating all Products
FOR R1 IN C1
LOOP
FINAL_FILNAME :='';
BEGIN
--To check whether Product is exists or not
SELECT AN.ID INTO V_NODE_ID
FROM TABLE1 AN,
TABLE2 ANP
WHERE
AN.ID=ANP.NODE_ID AND
ANP.VALUE in(R1.ISBN);
V_CATEGORY_COUNT :=0;
V_FINAL_ERRORMESSAGE :='';
--To check Whether Product inside the assets are having the 'category' is applied or not
FOR R2 IN C2(R1.ISBN)
LOOP
V_CATEGORY_COUNT := V_CATEGORY_COUNT+1;
BEGIN
--In this Logic Product inside the assets have applied the 'category' But those assets are having documents LIKE '%pdf%' or not
SELECT ANP.STRING_VALUE into V_FILENAME
FROM
table1 ANP,
table2 ANPP,
table3 ACD
WHERE
ANP.QNAME_ID=21 AND
ACD.ID=ANPP.LONG_VALUE
ANP.NODE_ID=ANPP.NODE_ID AND
ANPP.QNAME_ID=36 AND
ANP.STRING_VALUE LIKE '%pdf%' AND
ANP.NODE_ID=R2.NODE_ID;
FINAL_FILNAME := FINAL_FILNAME || V_FILENAME ||',';
EXCEPTION WHEN
NO_DATA_FOUND THEN
V_FINAL_ERRORMESSAGE:=V_FINAL_ERRORMESSAGE|| 'Category is applied for this Product But for the asset:'|| R2.NODE_ID || ':Documents[LIKE %pdf%] were not found ;';
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= V_FINAL_ERRORMESSAGE WHERE ISBN= R1.ISBN;
END;--Iterating for each NODEID
END LOOP;--Iterating the assets[Nodes] for each product of catgeory
-- DBMS_OUTPUT.PUT_LINE('R1.ISBN:' || R1.ISBN ||'::V_CATEGORY_COUNT:' || V_CATEGORY_COUNT);
IF(V_CATEGORY_COUNT = 0) THEN
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= 'Category is not applied to none of the Assets for this Product' WHERE ISBN= R1.ISBN;
END IF;
EXCEPTION WHEN
NO_DATA_FOUND THEN
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= 'Product is not Found:' WHERE ISBN= R1.ISBN;
END;
-- DBMS_OUTPUT.PUT_LINE( R1.ISBN || 'Final documents:'||FINAL_FILNAME);
UPDATE GT_ADD_ISBNS SET FILENAME=FINAL_FILNAME WHERE ISBN= R1.ISBN;
COMMIT;
END LOOP;--looping gt_isbns
END;
You have a number of potential performance hits. Here's one:
"We are iterating 100k+ records from global temporary table"
Global temporary tables can be pretty slow. Populating them means writing all that data to disk; reading from them means reading from disk. That's a lot of I/O which might be avoidable. Also, GTTs use the temporary tablespace so you may be in contention with other sessions doing large sorts.
Here's another red flag:
FOR R1 IN C1 LOOP
... FOR R2 IN C2(R1.ISBN) LOOP
SQL is a set-based language. It is optimised for joining tables and returning sets of data in a highly-performative fashion. Nested cursor loops mean row-by-row processing which is undoubtedly easier to code but may be orders of magnitude slower than the equivalent set operation would be.
--To check whether Product is exists or not
You have several queries selecting from the same tables (AN, 'ANP) using the same criteria (isbn`). Perhaps all these duplicates are the only way of validating your business rules but it seems unlikely.
FINAL_FILNAME := FINAL_FILNAME || V_FILENAME ||',';
Maybe you could rewrite your query to use listagg() instead of using procedural logic to concatenate a string?
UPDATE GT_ADD_ISBNS
Again, all your updates are single row operations instead of set ones.
"Is there any way to make this process faster in the stored procedure itself by doing batch process?"
Without knowing your rules and the context we cannot rewrite your logic for you, but 15-16 hours is way too long for this so you can definitely reduce the elapsed time.
Things to consider:
Replace the writing and reading to the temporary table with the query you use to populate it
Rewrite the loops to use BULK COLLECT with a high LIMIT (e.g. 1000) to improve the select efficiency. Find out more.
Populate arrays and use FORALL to improve the efficiency of the updates. Find out more.
Try to remove all those individual look-ups by incorporating the logic into the main query, using OUTER JOIN syntax to test for existence.
These are all guesses. If you really want to know where the procedure is spending the time - and that knowledge is the root of all successful tuning, so you ought to want to know - you should run the procedure under a PL/SQL Profiler. This will tell you which lines cost the most time, and those are usually the ones where you need to focus your tuning effort. If you don't already have access to DBMS_PROFILER you will need a DBA to run the install script for you. Find out more.
" can we change this code into Java and run this code in multi threaded mode?"
Given that one of the reasons for slowing down the procedure is the I/O cost of selecting from the temporary table there's a good chance multi-threading might introduce further contention and actually make things worse. You should seek to improve the stored procedure first.

Running PLSQL in parallel

Running PLSQL script to generate load
For some reasons (reproducing errors, ...) I would like to generate some load (with specific actions) in a PL SQL script.
What I would like to do:
A) Insert 1.000.000 rows in Schema A Table 1
B) In a loop and as best in parallel (2 or 3 times)
1) read from Schema-A.Table-1 one row with locking
2) insert it to Schema-B.Table-2
3) delete row from Schema-A.Table-1
Is there a way to do this B-task in a script in parallel in PLSQL on calling the script?
Who would this look like?
It's usually better to parallelize SQL statements inside a PL/SQL block, instead of trying to parallelize the entire PL/SQL block:
begin
execute immediate 'alter session enable parallel dml';
insert /*+ append parallel */ into schemaA.Table1 ...
commit;
insert /*+ append parallel */ into schemaB.Table2 ...
commit;
delete /*+ parallel */ from schemaA.Table1 where ...
commit;
dbms_stats.gather_table_stats('SCHEMAA', 'TABLE1', degree => 8);
dbms_stats.gather_table_stats('SCHEMAB', 'TABLE2', degree => 8);
end;
/
Large parallel DML statements usually require less code and run faster than creating your own parallelism in PL/SQL. Here are a few things to look out for:
You must have Enterprise Edition, large tables, decent hardware, and a sane configuration to run parallel SQL.
Setting the DOP is difficult. Using the hint /*+ parallel */ lets Oracle decide but you might want to play around with it by specifying a number, such as /*+ parallel(8) */.
Direct-path writes (the append hint) can be significantly faster. But they lock the entire table and the new results won't be recoverable until after the next backup.
Check the execution plan to ensure that direct-path writes are used - look for the operation LOAD AS SELECT instead of LOAD TABLE CONVENTIONAL. Tuning parallel SQL statements is best done with the Real-Time SQL Monitoring reports, found in select dbms_sqltune.report_sql_monitor(sql_id => 'SQL_ID') from dual;
You might want to read through the Parallel Execution Concepts chapter of the manual. Oracle parallelism can be tricky but it can also make your processes runs orders of magnitude faster if you're careful.
If objective is fast load and parallel is just an attempt to get that then do.
Create table newtemp as select old
To create table.
Then
Create table old_remaning as select old with not exists newtemp
Then drop old and new tables. Then do rename table . These operations will use parallel options at db level

Oracle insert 1000 rows at a time

I would like to insert 1000 rows at a time with oracle
Example:
INSERT INTO MSG(AUTHOR)
SELECT AUTHOR FROM oldDB.MSGLOG
This insert is taking a very long time but if I limit it with ROWNUM <= 1000 it will insert right away so I want to create an import that goes throuhg my X number of rows and inserts 1000 at at time.
Thanks
It is rather doubtful that this will really improve performance particularly given the simplicity of the SELECT statement. That must be doing either a full scan of the table or of an index on author. If that scan is slow, you're much better off diagnosing the underlying problem rather than trying to work around it (for example, perhaps oldDB.MsgLog has a number of empty blocks below the high water mark that forces a full table scan to read many more blocks than is strictly necessary).
If you really want to write some more verbose and less efficient PL/SQL to accomplish the task, though, you certainly can
DECLARE
TYPE tbl_authors IS TABLE OF msg.author%TYPE;
l_authors tbl_authors;
CURSOR author_cursor
IS SELECT author
FROM oldDB.MsgLog;
BEGIN
OPEN author_cursor;
LOOP
FETCH author_cursor
BULK COLLECT INTO l_authors
LIMIT 1000;
EXIT WHEN l_authors.count = 0;
FORALL i IN 1..l_authors.count
INSERT INTO msg( author )
VALUES( l_authors(i) );
END LOOP;
END;

Fastest way to insert a million rows in Oracle

How can I insert more than a million rows in Oracle in optimal way for the following procdeure? It hangs if I increase FOR loop to a million rows.
create or replace procedure inst_prc1 as
xssn number;
xcount number;
l_start Number;
l_end Number;
cursor c1 is select max(ssn)S1 from dtr_debtors1;
Begin
l_start := DBMS_UTILITY.GET_TIME;
FOR I IN 1..10000 LOOP
For C1_REC IN C1 Loop
insert into dtr_debtors1(SSN) values (C1_REC.S1+1);
End loop;
END LOOP;
commit;
l_end := DBMS_UTILITY.GET_TIME;
DBMS_OUTPUT.PUT_LINE('The Procedure Start Time is '||l_start);
DBMS_OUTPUT.PUT_LINE('The Procedure End Time is '||l_end);
End inst_prc1;
Your approach will lead to memory issues. Fastest way will be this [Query edited after David's comment to take care of null scenario] :
insert into dtr_debtors1(SSN)
select a.S1+level
from dual,(select nvl(max(ssn),0) S1 from dtr_debtors1) a
connect by level <= 10000
A select insert is the fastest approach as everything stays in RAM.
This query can become slow if it slips into Global temp area but then that needs DB tuning . I don't think there can be anything faster than this.
Few more details on memory use by Query:
Each query will have its own PGA [Program global area] which is basically RAM available to each query. If this this area is not sufficient to return query results then SQL engine starts using Golabl temp tablespace which is like hard disk and query starts becoming slow. If data needed by query is so huge that even temp area is not sufficient then you will tablespace error.
So always design query so that it stays in PGA else its a Red flag.
Inserting one row at a time with single insert statement within loop is slow. The fastest way is to use insert-select like the following, which generates a million rows and bulk insert.
insert into dtr_debtors1(SSN)
select level from dual connect by level <= 1000000;
Try to drop all the index created on your table and then try to insert using the select query. You can try this link which will help you in inserting millions of rows fast into your database.
1) If you want to insert using PL/SQL, then use BULK COLLECT INTO and for insert DML use BULK BIND FOR ALL.
2) In SQL multi insert use INSERT ALL statement.
3) Another method INSERT INTO <tb_nm> SELECT.
4) Use SQL LOADER Utility.

Oracle pl sql 10g - move set of rows from a table to a history table with same structure

PL SQL moves older versions of data from a transaction table to a history table of same structure and archive for a certain period -
for each record
insert into tab_hist (select older_versions of current row);
delete from tab (select older_versions of current row);
END
ps: earlier we were not archiving(no insert) - but after adding the insert it has doubled the run time - so can we accomplish insert and delete with a single select statement? as there is large data to be processed and across multiple table
This is a batch operation, right? In which case you should avoid Row By Row and use set processing. SQL is all about The Joy Of Sets.
Oracle has fantastic bulk SQL processing capabilities. The pseudo code you paosted would look something like this:
declare
cursor c_oldrecs is
select * from your_table
where criterion between some_date and some_other_date;
type rec_nt is table of your_table%rowtype;
oldrecs_coll rec_nt;
begin
open c_oldrecs;
loop
fetch c_oldrecs into oldrecs_coll limit 1000;
exit when oldrecs_coll.count() = 0;
forall i in oldrecs_coll.first() oldrecs_coll.last()
insert into your_table_hist
values oldrecs_coll(i);
forall i in oldrecs_coll.first() oldrecs_coll.last()
delete from your_table
where pk_col = oldrecs_coll(i).pk_col;
end loop;
end;
/
This bulk processing is faster because it sends one thousand statements to the database at a time, instead of switching between PL/SQL and SQL one thousand times. The LIMIT 1000 clause is there to prevent a really huge selection blowing the PGA. This safeguard may not be necessary in your case, or perhaps you can work with a higher value.
I think your current implementation is wrong. It is better to keep only the current version in the live table, and to keep all the historical versions in a separate table from the off. Use triggers to maintain the history as part of every transaction.
It may be that the slowness you are seeing is due to the logic that selects which rows are to be moved. If so, you might get better results by doing the select once to get the rowids into a nested table in memory, then doing the insert and the delete based on that list; or alternatively, driving your loop with a query that selects the rows to be moved.
You might instead consider creating a trigger on insert that will move the existing rows that "match" the row being inserted. This will slow down the inserts somewhat, but would mean you don't need any process to move the old rows in bulk.
If you are on Enterprise edition with the partitioning option, look at partition exchange.
As simple as this
CREATE BACKUP_TAB AS SELECT * FROM TAB
If you are deleting a lot of rows you will be hitting your undo tablespace and a delete which deletes say 100k rows can cause performance issues. You are better of deleting by batch say 5k rows at a time and committing.
BEGIN
-- Where condition on insert and delete must be the same
loop
INSERT INTO BACKUP_TAB SELECT * FROM TAB WHERE 1=1 and rownum < 5000; --Your condition here
exit when SQL%rowcount < 4999;
commit;
end loop;
loop
DELETE FROM TAB
where 1=1--Your condition here
and rownum < 5000;
exit when SQL%rowcount < 4999;
commit;
end loop;
commit;
END;

Resources