I am trying to copied records from one table to another as fast as possible.
Currently I have a simple cursor loop similiar to this:
FOR rec IN source_cursor LOOP
INSERT INTO destination (a, b) VALUES (rec.a, rec.b)
END LOOP;
I want to speed it up to be super fast so am trying some BULK operations (a BULK FETCH, then a FORALL insert):
Here is what I have for the bulk select / forall insert.
DECLARE
TYPE t__event_rows IS TABLE OF _event%ROWTYPE;
v__event_rows t__event_rows;
CURSOR c__events IS
SELECT * FROM _EVENT ORDER BY MESSAGE_ID;
BEGIN
OPEN c__events;
LOOP
FETCH c__events BULK COLLECT INTO v__event_rows LIMIT 10000; -- limit to 10k to avoid out of memory
EXIT WHEN c__events%NOTFOUND;
FORALL i IN 1..v__event_rows.COUNT SAVE EXCEPTIONS
INSERT INTO destinatoin
( col1, col2, a_sequence)
VALUES
( v__event_rows(i).col1, v__event_rows(i).col2, SOMESEQEUENCE.NEXTVAL );
END LOOP;
CLOSE c__events;
END;
My problem is that I'm not seeing any big gains in performance so far. From what I read it should be 10x-100x faster.
Am I missing a bottleneck here somewhere?
The only benefit your code has over a simple INSERT+SELECT is that you save exceptions, plus (as Justin points out) you have a pointless ORDER BY which is making it do a whole lot of meaningless work. You then don't have any code to do anything with the exceptions that were saved, anyway.
I'd just implement it as a INSERT+SELECT.
You donot have to use loops unnecessarily until it is required in the coding itself.
Related
We are iterating 100k+ records from global temporary table.below stored procedure will iterate all records from glogal temp table one by one and has to process below three steps.
to see whether product is exists or not
to see whether product inside the assets are having the 'category' or not.
to see whether the assets are having file names starts with '%pdf%' or not.
So each record has to process these 3 steps and final document names will be stored in the table for the successful record. If any error comes in any of the steps then error message will be stored for that record.
Below stored procedure is taking long time to process Because its processing sequentially.
Is there any way to make this process faster in the stored procedure itself by doing batch process?
If it's not possible in stored procedure then can we change this code into Java and run this code in multi threaded mode? like creating 10 threads and each thread will take one record concurrently and process this code. I would be happy if somebody gives some pseudo code.
which approach is going to suggest?
DECLARE
V_NODE_ID VARCHAR2(20);
V_FILENAME VARCHAR2(100);
V_CATEGORY_COUNT INTEGER :=0;
FINAL_FILNAME VARCHAR2(2000);
V_FINAL_ERRORMESSAGE VARCHAR2(2000);
CURSOR C1 IS
SELECT isbn FROM GT_ADD_ISBNS GT;
CURSOR C2(v_isbn in varchar2) IS
SELECT ANP.NODE_ID NODE_ID
FROM
table1 ANP,
table2 ANPP,
table3 AN
WHERE
ANP.NODE_ID=AN.ID AND
ANPP.NODE_ID=ANP.NODE_ID AND
AN.NAME_ID =26 AND
ANP.CATEORGY='category' AND
ANP.QNAME_ID='categories' AND
ANP.NODE_ID IN(SELECT CHILD_NODE_ID
FROM TABLE_ASSOC START WITH PARENT_NODE_ID IN(v_isbn)
CONNECT BY PRIOR CHILD_NODE_ID = PARENT_NODE_ID);
BEGIN
--Iterating all Products
FOR R1 IN C1
LOOP
FINAL_FILNAME :='';
BEGIN
--To check whether Product is exists or not
SELECT AN.ID INTO V_NODE_ID
FROM TABLE1 AN,
TABLE2 ANP
WHERE
AN.ID=ANP.NODE_ID AND
ANP.VALUE in(R1.ISBN);
V_CATEGORY_COUNT :=0;
V_FINAL_ERRORMESSAGE :='';
--To check Whether Product inside the assets are having the 'category' is applied or not
FOR R2 IN C2(R1.ISBN)
LOOP
V_CATEGORY_COUNT := V_CATEGORY_COUNT+1;
BEGIN
--In this Logic Product inside the assets have applied the 'category' But those assets are having documents LIKE '%pdf%' or not
SELECT ANP.STRING_VALUE into V_FILENAME
FROM
table1 ANP,
table2 ANPP,
table3 ACD
WHERE
ANP.QNAME_ID=21 AND
ACD.ID=ANPP.LONG_VALUE
ANP.NODE_ID=ANPP.NODE_ID AND
ANPP.QNAME_ID=36 AND
ANP.STRING_VALUE LIKE '%pdf%' AND
ANP.NODE_ID=R2.NODE_ID;
FINAL_FILNAME := FINAL_FILNAME || V_FILENAME ||',';
EXCEPTION WHEN
NO_DATA_FOUND THEN
V_FINAL_ERRORMESSAGE:=V_FINAL_ERRORMESSAGE|| 'Category is applied for this Product But for the asset:'|| R2.NODE_ID || ':Documents[LIKE %pdf%] were not found ;';
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= V_FINAL_ERRORMESSAGE WHERE ISBN= R1.ISBN;
END;--Iterating for each NODEID
END LOOP;--Iterating the assets[Nodes] for each product of catgeory
-- DBMS_OUTPUT.PUT_LINE('R1.ISBN:' || R1.ISBN ||'::V_CATEGORY_COUNT:' || V_CATEGORY_COUNT);
IF(V_CATEGORY_COUNT = 0) THEN
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= 'Category is not applied to none of the Assets for this Product' WHERE ISBN= R1.ISBN;
END IF;
EXCEPTION WHEN
NO_DATA_FOUND THEN
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= 'Product is not Found:' WHERE ISBN= R1.ISBN;
END;
-- DBMS_OUTPUT.PUT_LINE( R1.ISBN || 'Final documents:'||FINAL_FILNAME);
UPDATE GT_ADD_ISBNS SET FILENAME=FINAL_FILNAME WHERE ISBN= R1.ISBN;
COMMIT;
END LOOP;--looping gt_isbns
END;
You have a number of potential performance hits. Here's one:
"We are iterating 100k+ records from global temporary table"
Global temporary tables can be pretty slow. Populating them means writing all that data to disk; reading from them means reading from disk. That's a lot of I/O which might be avoidable. Also, GTTs use the temporary tablespace so you may be in contention with other sessions doing large sorts.
Here's another red flag:
FOR R1 IN C1 LOOP
... FOR R2 IN C2(R1.ISBN) LOOP
SQL is a set-based language. It is optimised for joining tables and returning sets of data in a highly-performative fashion. Nested cursor loops mean row-by-row processing which is undoubtedly easier to code but may be orders of magnitude slower than the equivalent set operation would be.
--To check whether Product is exists or not
You have several queries selecting from the same tables (AN, 'ANP) using the same criteria (isbn`). Perhaps all these duplicates are the only way of validating your business rules but it seems unlikely.
FINAL_FILNAME := FINAL_FILNAME || V_FILENAME ||',';
Maybe you could rewrite your query to use listagg() instead of using procedural logic to concatenate a string?
UPDATE GT_ADD_ISBNS
Again, all your updates are single row operations instead of set ones.
"Is there any way to make this process faster in the stored procedure itself by doing batch process?"
Without knowing your rules and the context we cannot rewrite your logic for you, but 15-16 hours is way too long for this so you can definitely reduce the elapsed time.
Things to consider:
Replace the writing and reading to the temporary table with the query you use to populate it
Rewrite the loops to use BULK COLLECT with a high LIMIT (e.g. 1000) to improve the select efficiency. Find out more.
Populate arrays and use FORALL to improve the efficiency of the updates. Find out more.
Try to remove all those individual look-ups by incorporating the logic into the main query, using OUTER JOIN syntax to test for existence.
These are all guesses. If you really want to know where the procedure is spending the time - and that knowledge is the root of all successful tuning, so you ought to want to know - you should run the procedure under a PL/SQL Profiler. This will tell you which lines cost the most time, and those are usually the ones where you need to focus your tuning effort. If you don't already have access to DBMS_PROFILER you will need a DBA to run the install script for you. Find out more.
" can we change this code into Java and run this code in multi threaded mode?"
Given that one of the reasons for slowing down the procedure is the I/O cost of selecting from the temporary table there's a good chance multi-threading might introduce further contention and actually make things worse. You should seek to improve the stored procedure first.
I would like to insert 1000 rows at a time with oracle
Example:
INSERT INTO MSG(AUTHOR)
SELECT AUTHOR FROM oldDB.MSGLOG
This insert is taking a very long time but if I limit it with ROWNUM <= 1000 it will insert right away so I want to create an import that goes throuhg my X number of rows and inserts 1000 at at time.
Thanks
It is rather doubtful that this will really improve performance particularly given the simplicity of the SELECT statement. That must be doing either a full scan of the table or of an index on author. If that scan is slow, you're much better off diagnosing the underlying problem rather than trying to work around it (for example, perhaps oldDB.MsgLog has a number of empty blocks below the high water mark that forces a full table scan to read many more blocks than is strictly necessary).
If you really want to write some more verbose and less efficient PL/SQL to accomplish the task, though, you certainly can
DECLARE
TYPE tbl_authors IS TABLE OF msg.author%TYPE;
l_authors tbl_authors;
CURSOR author_cursor
IS SELECT author
FROM oldDB.MsgLog;
BEGIN
OPEN author_cursor;
LOOP
FETCH author_cursor
BULK COLLECT INTO l_authors
LIMIT 1000;
EXIT WHEN l_authors.count = 0;
FORALL i IN 1..l_authors.count
INSERT INTO msg( author )
VALUES( l_authors(i) );
END LOOP;
END;
How can I insert more than a million rows in Oracle in optimal way for the following procdeure? It hangs if I increase FOR loop to a million rows.
create or replace procedure inst_prc1 as
xssn number;
xcount number;
l_start Number;
l_end Number;
cursor c1 is select max(ssn)S1 from dtr_debtors1;
Begin
l_start := DBMS_UTILITY.GET_TIME;
FOR I IN 1..10000 LOOP
For C1_REC IN C1 Loop
insert into dtr_debtors1(SSN) values (C1_REC.S1+1);
End loop;
END LOOP;
commit;
l_end := DBMS_UTILITY.GET_TIME;
DBMS_OUTPUT.PUT_LINE('The Procedure Start Time is '||l_start);
DBMS_OUTPUT.PUT_LINE('The Procedure End Time is '||l_end);
End inst_prc1;
Your approach will lead to memory issues. Fastest way will be this [Query edited after David's comment to take care of null scenario] :
insert into dtr_debtors1(SSN)
select a.S1+level
from dual,(select nvl(max(ssn),0) S1 from dtr_debtors1) a
connect by level <= 10000
A select insert is the fastest approach as everything stays in RAM.
This query can become slow if it slips into Global temp area but then that needs DB tuning . I don't think there can be anything faster than this.
Few more details on memory use by Query:
Each query will have its own PGA [Program global area] which is basically RAM available to each query. If this this area is not sufficient to return query results then SQL engine starts using Golabl temp tablespace which is like hard disk and query starts becoming slow. If data needed by query is so huge that even temp area is not sufficient then you will tablespace error.
So always design query so that it stays in PGA else its a Red flag.
Inserting one row at a time with single insert statement within loop is slow. The fastest way is to use insert-select like the following, which generates a million rows and bulk insert.
insert into dtr_debtors1(SSN)
select level from dual connect by level <= 1000000;
Try to drop all the index created on your table and then try to insert using the select query. You can try this link which will help you in inserting millions of rows fast into your database.
1) If you want to insert using PL/SQL, then use BULK COLLECT INTO and for insert DML use BULK BIND FOR ALL.
2) In SQL multi insert use INSERT ALL statement.
3) Another method INSERT INTO <tb_nm> SELECT.
4) Use SQL LOADER Utility.
I have a cursor which is
CURSOR B_CUR IS select DISTINCT big_id from TEMP_TABLE;
This would return multiple values. Earlier it was being used as
FOR b_id IN B_CUR LOOP
select s.col1, s.col2 INTO var1, var2 from sometable s where s.col3 = b_id.col1;
END LOOP;
Earlier it was certain that the inner select query would always return 1 row. Now this query can return multiple rows. How can I change this logic?
I was thinking to create a nested cursor which will fetch into an array of record type (which i will declare) but I have no idea how nested cursor would work here.
My main concern is efficiency. Since it would be working on millions of records per execution. Could you guys suggest what would be the best approach here?
Normally, you would just join the two tables.
FOR some_cursor IN (SELECT s.col1,
s.col2
FROM sometable s
JOIN temp_table t ON (s.col3 = t.col1))
LOOP
<<do something>>
END LOOP
Since you are concerned about efficiency, however
Is TEMP_TABLE really a temporary table? If so, why? It is exceedingly rare that Oracle actually needs to use temporary tables so that leads me to suspect that you're probably doing something inefficient to populate the temporary table in the first place.
Why do you have a cursor FOR loop to process the data from TEMP_TABLE? Row-by-row processing is the slowest way to do anything in PL/SQL so it would generally be avoided if you're concerned about efficiency. From a performance standpoint, you want to maximize SQL so that rather than doing a loop that did a series of single-row INSERT or UPDATE operations, you'd do a single INSERT or UPDATE that modified an entire set of rows. If you really need to process data in chunks, that's where PL/SQL collections and bulk processing would come in to play but that will not be as efficient as straight SQL.
Why do you have the DISTINCT in your query against TEMP_TABLE? Do you really expect that there will be duplicate big_id values that are not erroneous? Most of the time, people use DISTINCT incorrectly either to cover up problems where data has been joined incorrectly or where you're forcing Oracle to do an expensive sort just in case incorrect data gets created in the future when a constraint would be the more appropriate way to protect yourself.
FOR b_id IN B_CUR LOOP
for c_id in (select s.col1, s.col2 INTO var1, var2 from sometable s where s.col3 = b_id.col1)loop
......
end loop;
END LOOP;
How to query bulk collection? If for example I have
select name
bulk collect into namesValues
from table1
where namesValues is dbms_sql.varchar2_table.
Now, I have another table XYZ which contains
name is_valid
v
h
I want to update is_valid to 'Y' if name is in table1 else 'N'. Table1 has 10 million rows. After bulk collecting I want to execute
update xyz
set is_valid ='Y'
where name in namesValue.
How to query namesValue? Or is there is another option. Table1 has no index.
please help.
As Tom Kyte (Oracle Corp. Vice President) says:
My mantra, that I'll be sticking with thank you very much, is:
You should do it in a single SQL statement if at all possible.
If you cannot do it in a single SQL Statement, then do it in PL/SQL.
If you cannot do it in PL/SQL, try a Java Stored Procedure.
If you cannot do it in Java, do it in a C external procedure.
If you cannot do it in a C external routine, you might want to
seriously think about why it is you need to do it…
think in sets...
learn all there is to learn about SQL...
You should perform your update in SQL if you can. If you need to add an index to do this then that might be preferable to looping through a collection populated with BULK COLLECT.
If however, this is some sort of assignment....
You should specify it as such but here's how you would do it.
I have assumed that your DB server does not have the capacity to hold 10 million records in memory so rather than BULK COLLECTing all 10 million records in one go I have put the BULK COLLECT into a loop to reduce your memory overheads. If this is not the case then you can omit the bulk collect loop.
DECLARE
c_bulk_limit CONSTANT PLS_INTEGER := 500000;
--
CURSOR names_cur
IS
SELECT name
FROM table1;
--
TYPE namesValuesType IS TABLE OF table1.name%TYPE
INDEX BY PLS_INTEGER;
namesValues namesValuesType;
BEGIN
-- Populate the collection
OPEN name_cur;
LOOP
-- Fetch the records in a loop limiting them
-- to the c_bulk_limit amount at a time
FETCH name_cur BULK COLLECT INTO namesValues
LIMIT c_bulk_limit;
-- Process the records in your collection
FORALL x IN INDICES OF namesValues
UPDATE xyz
SET is_valid ='Y'
WHERE name = namesValue(x)
AND is_valid != 'Y';
-- Set up loop exit criteria
EXIT WHEN namesValues.COUNT < c_bulk_limit;
END LOOP;
CLOSE name_cur;
-- You want to update all remaining rows to 'N'
UPDATE xyz
SET is_valid ='N'
WHERE is_valid IS NULL;
EXCEPTION
WHEN others
THEN
IF name_cur%ISOPEN
THEN
CLOSE name_cur;
END IF;
-- Re-raise the exception;
RAISE;
END;
/
Depending upon your rollback segment sizes etc. you may want to issue interim commits within the bulk collect loop but be aware that you will not then be able to rollback these changes. I deliberately haven't added any COMMITs to this so you can choose where to put them to suit your system.
You also might want to change the size of the c_bulk_limit constant depending upon the resources available to you.
Your update will still cause you problems if the xyz table is large and there is no index on the name column.
Hope it helps...
"Table1 has no index."
Well there's your problem right there. Why not? Put an index on TABLE1.NAME and use a normal SQL UPDATE to amend the data in XYZ.
Trying to solve this problem with bulk collect is not the proper approach.