I would like to insert 1000 rows at a time with oracle
Example:
INSERT INTO MSG(AUTHOR)
SELECT AUTHOR FROM oldDB.MSGLOG
This insert is taking a very long time but if I limit it with ROWNUM <= 1000 it will insert right away so I want to create an import that goes throuhg my X number of rows and inserts 1000 at at time.
Thanks
It is rather doubtful that this will really improve performance particularly given the simplicity of the SELECT statement. That must be doing either a full scan of the table or of an index on author. If that scan is slow, you're much better off diagnosing the underlying problem rather than trying to work around it (for example, perhaps oldDB.MsgLog has a number of empty blocks below the high water mark that forces a full table scan to read many more blocks than is strictly necessary).
If you really want to write some more verbose and less efficient PL/SQL to accomplish the task, though, you certainly can
DECLARE
TYPE tbl_authors IS TABLE OF msg.author%TYPE;
l_authors tbl_authors;
CURSOR author_cursor
IS SELECT author
FROM oldDB.MsgLog;
BEGIN
OPEN author_cursor;
LOOP
FETCH author_cursor
BULK COLLECT INTO l_authors
LIMIT 1000;
EXIT WHEN l_authors.count = 0;
FORALL i IN 1..l_authors.count
INSERT INTO msg( author )
VALUES( l_authors(i) );
END LOOP;
END;
Related
I've searched across the whole internet for some examples, but I still can't get my head around why I can not use DML statement inside this cursor. I'm kind of missing the theory behind it, but I won't deny an example of how to do write this correctly would make my life lots easier as well. Here is the query I'm working on (Note: I removed exits when not found results, close if cursor already open and things like that just to focus on the main point here):
DECLARE
// lots of vars
// the cursor below gets all datasources connected to Node XXYZ123
CURSOR DataSourceCheck
IS
SELECT NODENAAM, NAAM, URL, DBNODE1, DBNODE2, DBUSERNAAM, DBNAAM
FROM SCHEMA.TABLENAME
WHERE NODENAAM = 'XXYZ123';
// this cursor will execute row-by-row based on the result set of above cursor
CURSOR CheckIfOnlyDataSource
IS
SELECT NODENAAM, NAAM, URL, DBNODE1, DBNODE2, DBUSERNAAM, DBNAAM
FROM SCHEMA.TABLENAME
WHERE DBUSERNAAM = var_dbusernaam AND (DBNode1 = var_dbnode1 OR DBNode2 = var_dbnode2);
BEGIN
OPEN DataSourceCheck;
LOOP
FETCH DataSourceCheck into var_nodenaam, var_naam, var_URL, var_dbnode1, var_dbnode2, var_dbusernaam, var_dbnaam;
var_rowcount:= 0;
OPEN CheckIfOnlyDataSource;
LOOP
FETCH CheckIfOnlyDataSource into var_nodenaam2, var_naam2, var_URL2, var_dbnode12, var_dbnode22, var_dbusernaam2, var_dbnaam2;
var_rowcount:= var_rowcount + 1;
END LOOP;
// only save result in a temp table when var_rowcount is 1 and not higher.
IF var_rowcount = 1
THEN
INSERT INTO global_temp_table
(t_dbusernaam, t_nodenaam, t_dbnode1, t_dbnode2, t_distinctcount)
VALUES
(var_dbusernaam2, var_nodenaam2, var_dbnode12, var_dbnode22, var_rowcount)
END IF;
CLOSE CheckIfOnlyDataSource;
END LOOP;
END;
The point of failure is this part, with the message that DML should be reconfigured into FORALL or BULK INTO statements:
IF var_rowcount = 1
THEN
INSERT INTO global_temp_table
(t_dbusernaam, t_nodenaam, t_dbnode1, t_dbnode2, t_distinctcount)
VALUES
(var_dbusernaam2, var_nodenaam2, var_dbnode12, var_dbnode22, var_rowcount)
END IF;
I don't understand why DML is not working in a row-by-row approach? The output is clearly stored inside the variables var_dbusernaam2, var_nodenaam2, var_dbnode12 and var_dbnode22, hence I can do a dbms_output.put_line to show them. But if it is stored into the variable already, then why can't I just store it simply into a table (this isn't billions of bulk data, not even 1000 records!).
Is there no simple workaround? I gave BULK COLLECT and FORALL a try, but I need a lot more time to invest to understand it and get the query right - the cursor in a cursor definately won't make it any easier.
In addition to the suggestion in Mottor's answer, the reason why Toad is flagging up your code is because row-by-row processing is slow. You've got a lot of context switching going on between the PL/SQL and SQL engines.
Think of it like building new wall near your house - if the bricks are delivered to the bottom of the drive, do you:
Go to the pile of bricks
Pick up a single brick
Go back to your wall
Add the brick onto the wall
Go back to step 1 and repeat until the wall is complete
(This is the equivalent of row-by-row processing)
Or:
Take your wheelbarrow down to the pile of bricks
Load your wheelbarrow with as many bricks as will fit and/or you can carry
Take the wheelbarrow back over to the wall
Add each brick into the wall
Go back to step 1 and repeat until the wall is complete.
(This is the equivalent of bulk processing.)
Of course, if you're canny, you could avoid all the walking and carrying required in the above scenarios by getting the bricks delivered right next to the wall in the first place. (This is the equivalent of set-based processing).
Turning your procedure into a set-based approach (incorporating Mottor's answer) would make it simply:
declare
-- lots of vars
begin
insert into global_temp_table (t_dbusernaam,
t_nodenaam,
t_dbnode1,
t_dbnode2,
t_distinctcount)
select dbusernaam,
nodenaam,
dbnode1,
dbnode2,
cnt
from (select nodenaam,
naam,
url,
dbnode1,
dbnode2,
dbusernaam,
dbnaam,
count(*) over (partition by dbnode1, dbnode2, dbusernaam) cnt
from schema.tablename
where nodenaam = 'XXYZ123')
where cnt = 1;
end;
/
This has the advantage of being more compact than your original code, making it easier to read, understand and therefore debug. Plus you can run the select statement on its own outside of the procedure - much easier to see what it's doing that way.
It will also be faster than your original approach of looping through two cursors (which, by the way, was reinventing the nested loop join - something that the database is optimised to do in pure SQL... and may not be the fastest way of doing the join anyway, if you had been stuck with keeping the join!).
I'd also be interested to know why you need to insert the rows into the GLOBAL_TEMP_TABLE (which I suspect is a GTT - global temporary table - rather than a normal heap table) - can you not do the subsequent processing in a single SQL statement, using the above select statement rather than inserting the data into the GTT?
This is not an error, but the TOAD suggestion with number Rule 4809.
P.S. If the table is the same in the both query, you can use
..., COUNT(*) OVER (PARTITION BY DBNODE1, DBNODE2, DBUSERNAAM) c
in the first query to get the number of rows per DBNODE1, DBNODE2, DBUSERNAAM and not to need the second one.
Can you help me to understand this phrase?
Without the bulk bind, PL/SQL sends a SQL statement to the SQL engine
for each record that is inserted, updated, or deleted leading to
context switches that hurt performance.
Within Oracle, there is a SQL virtual machine (VM) and a PL/SQL VM. When you need to move from one VM to the other VM, you incur the cost of a context shift. Individually, those context shifts are relatively quick, but when you're doing row-by-row processing, they can add up to account for a significant fraction of the time your code is spending. When you use bulk binds, you move multiple rows of data from one VM to the other with a single context shift, significantly reducing the number of context shifts, making your code faster.
Take, for example, an explicit cursor. If I write something like this
DECLARE
CURSOR c
IS SELECT *
FROM source_table;
l_rec source_table%rowtype;
BEGIN
OPEN c;
LOOP
FETCH c INTO l_rec;
EXIT WHEN c%notfound;
INSERT INTO dest_table( col1, col2, ... , colN )
VALUES( l_rec.col1, l_rec.col2, ... , l_rec.colN );
END LOOP;
END;
then every time I execute the fetch, I am
Performing a context shift from the PL/SQL VM to the SQL VM
Asking the SQL VM to execute the cursor to generate the next row of data
Performing another context shift from the SQL VM back to the PL/SQL VM to return my single row of data
And every time I insert a row, I'm doing the same thing. I am incurring the cost of a context shift to ship one row of data from the PL/SQL VM to the SQL VM, asking the SQL to execute the INSERT statement, and then incurring the cost of another context shift back to PL/SQL.
If source_table has 1 million rows, that's 4 million context shifts which will likely account for a reasonable fraction of the elapsed time of my code. If, on the other hand, I do a BULK COLLECT with a LIMIT of 100, I can eliminate 99% of my context shifts by retrieving 100 rows of data from the SQL VM into a collection in PL/SQL every time I incur the cost of a context shift and inserting 100 rows into the destination table every time I incur a context shift there.
If can rewrite my code to make use of bulk operations
DECLARE
CURSOR c
IS SELECT *
FROM source_table;
TYPE nt_type IS TABLE OF source_table%rowtype;
l_arr nt_type;
BEGIN
OPEN c;
LOOP
FETCH c BULK COLLECT INTO l_arr LIMIT 100;
EXIT WHEN l_arr.count = 0;
FORALL i IN 1 .. l_arr.count
INSERT INTO dest_table( col1, col2, ... , colN )
VALUES( l_arr(i).col1, l_arr(i).col2, ... , l_arr(i).colN );
END LOOP;
END;
Now, every time I execute the fetch, I retrieve 100 rows of data into my collection with a single set of context shifts. And every time I do my FORALL insert, I am inserting 100 rows with a single set of context shifts. If source_table has 1 million rows, this means that I've gone from 4 million context shifts to 40,000 context shifts. If context shifts accounted for, say, 20% of the elapsed time of my code, I've eliminated 19.8% of the elapsed time.
You can increase the size of the LIMIT to further reduce the number of context shifts but you quickly hit the law of diminishing returns. If you used a LIMIT of 1000 rather than 100, you'd eliminate 99.9% of the context shifts rather than 99%. That would mean that your collection was using 10x more PGA memory, however. And it would only eliminate 0.18% more elapsed time in our hypothetical example. You very quickly reach a point where the additional memory you're using adds more time than you save by eliminating additional context shifts. In general, a LIMIT somewhere between 100 and 1000 is likely to be the sweet spot.
Of course, in this example, it would be more efficient still to eliminate all context shifts and do everything in a single SQL statement
INSERT INTO dest_table( col1, col2, ... , colN )
SELECT col1, col2, ... , colN
FROM source_table;
It would only make sense to resort to PL/SQL in the first place if you're doing some sort of manipulation of the data from the source table that you can't reasonably implement in SQL.
Additionally, I used an explicit cursor in my example intentionally. If you are using implicit cursors, in recent versions of Oracle, you get the benefits of a BULK COLLECT with a LIMIT of 100 implicitly. There is another StackOverflow question that discusses the relative performance benefits of implicit and explicit cursors with bulk operations that goes into more detail about those particular wrinkles.
AS I understand this, there are two engine involved, PL/SQL engine and SQL Engine. Executing a query that make use of one engine at a time is more efficient than switching between the two
Example:
INSERT INTO t VALUES(1)
is processed by SQL engine while
FOR Lcntr IN 1..20
END LOOP
is executed by PL/SQL engine
If you combine the two statement above, putting INSERT in the loop,
FOR Lcntr IN 1..20
INSERT INTO t VALUES(1)
END LOOP
Oracle will be switching between the two engines, for the each (20) iterations.
In this case BULK INSERT is recommended which makes use of PL/SQL engine all through the execution
Can you help me to understand this phrase?
Without the bulk bind, PL/SQL sends a SQL statement to the SQL engine
for each record that is inserted, updated, or deleted leading to
context switches that hurt performance.
Within Oracle, there is a SQL virtual machine (VM) and a PL/SQL VM. When you need to move from one VM to the other VM, you incur the cost of a context shift. Individually, those context shifts are relatively quick, but when you're doing row-by-row processing, they can add up to account for a significant fraction of the time your code is spending. When you use bulk binds, you move multiple rows of data from one VM to the other with a single context shift, significantly reducing the number of context shifts, making your code faster.
Take, for example, an explicit cursor. If I write something like this
DECLARE
CURSOR c
IS SELECT *
FROM source_table;
l_rec source_table%rowtype;
BEGIN
OPEN c;
LOOP
FETCH c INTO l_rec;
EXIT WHEN c%notfound;
INSERT INTO dest_table( col1, col2, ... , colN )
VALUES( l_rec.col1, l_rec.col2, ... , l_rec.colN );
END LOOP;
END;
then every time I execute the fetch, I am
Performing a context shift from the PL/SQL VM to the SQL VM
Asking the SQL VM to execute the cursor to generate the next row of data
Performing another context shift from the SQL VM back to the PL/SQL VM to return my single row of data
And every time I insert a row, I'm doing the same thing. I am incurring the cost of a context shift to ship one row of data from the PL/SQL VM to the SQL VM, asking the SQL to execute the INSERT statement, and then incurring the cost of another context shift back to PL/SQL.
If source_table has 1 million rows, that's 4 million context shifts which will likely account for a reasonable fraction of the elapsed time of my code. If, on the other hand, I do a BULK COLLECT with a LIMIT of 100, I can eliminate 99% of my context shifts by retrieving 100 rows of data from the SQL VM into a collection in PL/SQL every time I incur the cost of a context shift and inserting 100 rows into the destination table every time I incur a context shift there.
If can rewrite my code to make use of bulk operations
DECLARE
CURSOR c
IS SELECT *
FROM source_table;
TYPE nt_type IS TABLE OF source_table%rowtype;
l_arr nt_type;
BEGIN
OPEN c;
LOOP
FETCH c BULK COLLECT INTO l_arr LIMIT 100;
EXIT WHEN l_arr.count = 0;
FORALL i IN 1 .. l_arr.count
INSERT INTO dest_table( col1, col2, ... , colN )
VALUES( l_arr(i).col1, l_arr(i).col2, ... , l_arr(i).colN );
END LOOP;
END;
Now, every time I execute the fetch, I retrieve 100 rows of data into my collection with a single set of context shifts. And every time I do my FORALL insert, I am inserting 100 rows with a single set of context shifts. If source_table has 1 million rows, this means that I've gone from 4 million context shifts to 40,000 context shifts. If context shifts accounted for, say, 20% of the elapsed time of my code, I've eliminated 19.8% of the elapsed time.
You can increase the size of the LIMIT to further reduce the number of context shifts but you quickly hit the law of diminishing returns. If you used a LIMIT of 1000 rather than 100, you'd eliminate 99.9% of the context shifts rather than 99%. That would mean that your collection was using 10x more PGA memory, however. And it would only eliminate 0.18% more elapsed time in our hypothetical example. You very quickly reach a point where the additional memory you're using adds more time than you save by eliminating additional context shifts. In general, a LIMIT somewhere between 100 and 1000 is likely to be the sweet spot.
Of course, in this example, it would be more efficient still to eliminate all context shifts and do everything in a single SQL statement
INSERT INTO dest_table( col1, col2, ... , colN )
SELECT col1, col2, ... , colN
FROM source_table;
It would only make sense to resort to PL/SQL in the first place if you're doing some sort of manipulation of the data from the source table that you can't reasonably implement in SQL.
Additionally, I used an explicit cursor in my example intentionally. If you are using implicit cursors, in recent versions of Oracle, you get the benefits of a BULK COLLECT with a LIMIT of 100 implicitly. There is another StackOverflow question that discusses the relative performance benefits of implicit and explicit cursors with bulk operations that goes into more detail about those particular wrinkles.
AS I understand this, there are two engine involved, PL/SQL engine and SQL Engine. Executing a query that make use of one engine at a time is more efficient than switching between the two
Example:
INSERT INTO t VALUES(1)
is processed by SQL engine while
FOR Lcntr IN 1..20
END LOOP
is executed by PL/SQL engine
If you combine the two statement above, putting INSERT in the loop,
FOR Lcntr IN 1..20
INSERT INTO t VALUES(1)
END LOOP
Oracle will be switching between the two engines, for the each (20) iterations.
In this case BULK INSERT is recommended which makes use of PL/SQL engine all through the execution
How to query bulk collection? If for example I have
select name
bulk collect into namesValues
from table1
where namesValues is dbms_sql.varchar2_table.
Now, I have another table XYZ which contains
name is_valid
v
h
I want to update is_valid to 'Y' if name is in table1 else 'N'. Table1 has 10 million rows. After bulk collecting I want to execute
update xyz
set is_valid ='Y'
where name in namesValue.
How to query namesValue? Or is there is another option. Table1 has no index.
please help.
As Tom Kyte (Oracle Corp. Vice President) says:
My mantra, that I'll be sticking with thank you very much, is:
You should do it in a single SQL statement if at all possible.
If you cannot do it in a single SQL Statement, then do it in PL/SQL.
If you cannot do it in PL/SQL, try a Java Stored Procedure.
If you cannot do it in Java, do it in a C external procedure.
If you cannot do it in a C external routine, you might want to
seriously think about why it is you need to do it…
think in sets...
learn all there is to learn about SQL...
You should perform your update in SQL if you can. If you need to add an index to do this then that might be preferable to looping through a collection populated with BULK COLLECT.
If however, this is some sort of assignment....
You should specify it as such but here's how you would do it.
I have assumed that your DB server does not have the capacity to hold 10 million records in memory so rather than BULK COLLECTing all 10 million records in one go I have put the BULK COLLECT into a loop to reduce your memory overheads. If this is not the case then you can omit the bulk collect loop.
DECLARE
c_bulk_limit CONSTANT PLS_INTEGER := 500000;
--
CURSOR names_cur
IS
SELECT name
FROM table1;
--
TYPE namesValuesType IS TABLE OF table1.name%TYPE
INDEX BY PLS_INTEGER;
namesValues namesValuesType;
BEGIN
-- Populate the collection
OPEN name_cur;
LOOP
-- Fetch the records in a loop limiting them
-- to the c_bulk_limit amount at a time
FETCH name_cur BULK COLLECT INTO namesValues
LIMIT c_bulk_limit;
-- Process the records in your collection
FORALL x IN INDICES OF namesValues
UPDATE xyz
SET is_valid ='Y'
WHERE name = namesValue(x)
AND is_valid != 'Y';
-- Set up loop exit criteria
EXIT WHEN namesValues.COUNT < c_bulk_limit;
END LOOP;
CLOSE name_cur;
-- You want to update all remaining rows to 'N'
UPDATE xyz
SET is_valid ='N'
WHERE is_valid IS NULL;
EXCEPTION
WHEN others
THEN
IF name_cur%ISOPEN
THEN
CLOSE name_cur;
END IF;
-- Re-raise the exception;
RAISE;
END;
/
Depending upon your rollback segment sizes etc. you may want to issue interim commits within the bulk collect loop but be aware that you will not then be able to rollback these changes. I deliberately haven't added any COMMITs to this so you can choose where to put them to suit your system.
You also might want to change the size of the c_bulk_limit constant depending upon the resources available to you.
Your update will still cause you problems if the xyz table is large and there is no index on the name column.
Hope it helps...
"Table1 has no index."
Well there's your problem right there. Why not? Put an index on TABLE1.NAME and use a normal SQL UPDATE to amend the data in XYZ.
Trying to solve this problem with bulk collect is not the proper approach.
PL SQL moves older versions of data from a transaction table to a history table of same structure and archive for a certain period -
for each record
insert into tab_hist (select older_versions of current row);
delete from tab (select older_versions of current row);
END
ps: earlier we were not archiving(no insert) - but after adding the insert it has doubled the run time - so can we accomplish insert and delete with a single select statement? as there is large data to be processed and across multiple table
This is a batch operation, right? In which case you should avoid Row By Row and use set processing. SQL is all about The Joy Of Sets.
Oracle has fantastic bulk SQL processing capabilities. The pseudo code you paosted would look something like this:
declare
cursor c_oldrecs is
select * from your_table
where criterion between some_date and some_other_date;
type rec_nt is table of your_table%rowtype;
oldrecs_coll rec_nt;
begin
open c_oldrecs;
loop
fetch c_oldrecs into oldrecs_coll limit 1000;
exit when oldrecs_coll.count() = 0;
forall i in oldrecs_coll.first() oldrecs_coll.last()
insert into your_table_hist
values oldrecs_coll(i);
forall i in oldrecs_coll.first() oldrecs_coll.last()
delete from your_table
where pk_col = oldrecs_coll(i).pk_col;
end loop;
end;
/
This bulk processing is faster because it sends one thousand statements to the database at a time, instead of switching between PL/SQL and SQL one thousand times. The LIMIT 1000 clause is there to prevent a really huge selection blowing the PGA. This safeguard may not be necessary in your case, or perhaps you can work with a higher value.
I think your current implementation is wrong. It is better to keep only the current version in the live table, and to keep all the historical versions in a separate table from the off. Use triggers to maintain the history as part of every transaction.
It may be that the slowness you are seeing is due to the logic that selects which rows are to be moved. If so, you might get better results by doing the select once to get the rowids into a nested table in memory, then doing the insert and the delete based on that list; or alternatively, driving your loop with a query that selects the rows to be moved.
You might instead consider creating a trigger on insert that will move the existing rows that "match" the row being inserted. This will slow down the inserts somewhat, but would mean you don't need any process to move the old rows in bulk.
If you are on Enterprise edition with the partitioning option, look at partition exchange.
As simple as this
CREATE BACKUP_TAB AS SELECT * FROM TAB
If you are deleting a lot of rows you will be hitting your undo tablespace and a delete which deletes say 100k rows can cause performance issues. You are better of deleting by batch say 5k rows at a time and committing.
BEGIN
-- Where condition on insert and delete must be the same
loop
INSERT INTO BACKUP_TAB SELECT * FROM TAB WHERE 1=1 and rownum < 5000; --Your condition here
exit when SQL%rowcount < 4999;
commit;
end loop;
loop
DELETE FROM TAB
where 1=1--Your condition here
and rownum < 5000;
exit when SQL%rowcount < 4999;
commit;
end loop;
commit;
END;