Optimizing SQL by using temporary table in Oracle - oracle

I have a data cleanup-er procedure which deletes the same data from the card rows of two tables.
Both of these update statement use the same subQuery for detecting which rows should be updated.
UPDATE table_1 SET card = NULL WHERE id in
(select id from sub_table WHERE /* complex clause here */);
UPDATE table_2 SET card = NULL WHERE id in
(select id from sub_table WHERE /* complex clause here */);
Is using Oracle Temporary table good solution for optimizing my code?
CREATE TEMPORARY TABLE tmp_sub_table AS
select id from sub_table WHERE /* complex clause here */;
UPDATE table_1 SET card = NULL WHERE id in (select * from tmp_sub_table);
UPDATE table_2 SET card = NULL WHERE id in (select * from tmp_sub_table);
Should I use Local temporary table or Global Temporary table?

Global Temporary Tables are persistent data structures. When we INSERT the data is written to disk, when we SELECT the data is read from disk. So that's quite a lot of Disk I/O: the cost saving from running the same query twice must be greater than the cost of all those writes and reads.
One thing to watch out for is that GTTs are built on a Temporary Tablespace, so you might get contention with other long running processes which are doing sorts, etc. It's a good idea to have a separate Temporary Tablespace, just for GTTs but not many DBAs do this.
An alternative solution would be to use a collection to store subsets of the records in memory and use bulk processing.
declare
l_ids sys.ocinumberlist;
cursor l_cur is
select id from sub_table WHERE /* complex clause here */
order by id
;
begin
open lcur;
loop
fetch lcur bulk collect into l_ids limit 5000;
exit when l_ids.count() = 0;
update table1
set card=null
where id member of l_ids;
update table2
set card=null
where id member of l_ids;
end loop;
end;
"updating many rows with one update statement ... works much faster than updating separately using Looping over cursor"
That is the normal advice. But this is a bulk operation: it is updating five thousand rows at a time, so it's faster than row-by-row. The size of the batch is governed by the BULK COLLECT ... LIMIT clause: you don't want to make the value too high because the collection is in session memory but as you're only select one column - and a number - maybe you can make it higher.
As always tuning is a matter of benchmarking. Have you established that running this sub-query twice is a high-cost operation?
select id from sub_table WHERE /* complex clause here */
If it seems too slow you need to test other approaches and see whether they're faster. Maybe a Global Temporary Table is faster than a bulk operation. Generally memory access is faster than disk access, but you need to see which works best for you.

Related

Query cost: Global Temporary Tables vs. Collections (Virtual Arrays)

I have a query whose results are stored in a GTT (Global Temporary Table) and in a Collection.
Selecting the data from the GTT again, I get a very small cost: 103.
SELECT
...
FROM my_table_gtt
JOIN table2 ...
JOIN table3 ...
But when switching this from a GTT to a Collection (VA - Virtual Array), the cost skyrockets (78.000), but the difference in execution times between the two is very small.
SELECT
...
FROM TABLE(CAST(my_table_va as my_table_tt))
JOIN table2 ...
JOIN table3 ...
My question is why is there such a big difference in cost between the two approaches? From my knowledge, GTTs don't store table statistics, so why is it returning a better cost than the VA?
Global temporary tables can have statistics as any other table. In fact they are like any other table, they have data segments, just in temporary tablespace.
In 11g the statistics are global so they sometimes cause issues with execution plans. In 12c they are session based so each session gets proper ones (if available).
The collection type cardinality is based on DB block size and for default 8 kB block is 8168. Collection content is stored in PGA. It's quite common to hint the cardinality when using collection types in complex queries to hint the optimizer. You can also use extended optimizer interface for implementing own way for calculating cost.
Edit - added tests:
CREATE TYPE STRINGTABLE IS TABLE OF VARCHAR2(255);
CREATE GLOBAL TEMPORARY TABLE TMP (VALUE VARCHAR2(255));
INSERT INTO TMP SELECT 'Value' || LEVEL FROM DUAL CONNECT BY LEVEL <= 1000000;
DECLARE
x STRINGTABLE;
cnt NUMBER;
BEGIN
SELECT VALUE BULK COLLECT INTO x FROM TMP;
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
SELECT SUM(LENGTH(VALUE)) INTO cnt FROM TMP;
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
SELECT SUM(LENGTH(COLUMN_VALUE)) INTO cnt FROM TABLE(x);
DBMS_OUTPUT.PUT_LINE(TO_CHAR(SYSTIMESTAMP, 'MI:SS.FF3'));
END;
In this case is the access to GTT about twice as fast then to collection, cca 200 ms vs. 400 ms on my test machine. When I increased the number of rows to 10 000 000, I got ORA-22813: operand value exceeds system limits on the second query.
The most important difference between collections and GTT in SQL, is that CBO(cost-based optimizer) has limitations for TABLE function (kokbf$...), for example JPPD doesn't work with TABLE() functions.
Some workarounds: http://orasql.org/2019/05/30/workarounds-for-jppd-with-view-and-tablekokbf-xmltable-or-json_table-functions/

Why is Oracle RESULT_CACHE not reducing the number of LAST_CR_BUFFER gets?

I'm doing some tests with the Oracle result_cache and came across something that looks strange (to me, anyway).
I've created the following table:
create table CACHE_TEST(
COL1 int,
COL2 int
);
And inserted some dummy data into it:
insert into CACHE_TEST select * from(
select level as COL1, level*3 as COL2 from DUAL
connect by level <= 100
);
Now I run an Autotrace on the following query:
select * from CACHE_TEST;
As expected, it shows a normal full table scan. I then run an Autotrace on the following query several times, and expect it to be using the result cache:
select /*+ RESULT_CACHE */ * from CACHE_TEST;
The Autotrace shows that it is indeed using the cache, but the number of buffer gets and cost is exactly the same as the first query.
Interestingly, if I do some kind of aggregate, eg:
select /*+ RESULT_CACHE */ AVG(COL1) FROM CACHE_TEST;
It reduces the buffer gets to zero, but the cost is still the same.
Can anyone explain why:
The result cache doesn't seem to reduce the number of buffer gets unless you do an aggregate?
Even when it does use the result cache the cost is still reportedly the same (even though I see a marked performance increase)?
If you're selecting all the data from the table, why would you expect it to require fewer reads to read all that data from the table (which is likely completely in the buffer cache) rather than from the result cache? Either way, you're reading the same number of blocks and getting the same amount of data back.
The result cache is really helpful when your query is doing some sort of expensive calculation. That could be an aggregate-- reading a single cached value is obviously more efficient than reading every row from a table. Or it could be a query that does a lot of work to figure out which subset of the table to read (joining through some other tables, for example).

Optimal way to DELETE specified rows from Oracle

I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.
My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:
7384 separate DELETE statements took 203 seconds:
delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...
7384 separate SELECT statements took 57 seconds:
select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...
7384 separate DELETE from (SELECT) statements took 214 seconds:
delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...
1 SELECT statement that has 7384 OR clauses in the where took 127.4s:
select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...
1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:
delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)
While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.
So my questions are:
Is there a better way to delete?
Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.
In advance of my questions being answered, this is how I'd go about it:
Minimize the number of statements and the work they do issued in relative terms.
All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.
Consider Using CREATE TABLE AS SELECT for really large deletes
If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.
Don't run DELETE statements you know won't delete any rows
If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.
CREATE TABLE TABLE1_PURGE NOLOGGING
AS
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;
DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DROP TABLE TABLE1_PURGE;
and repeat.
Manage Concurrency if you have to
Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.
declare
l_sql varchar2(4000);
begin
for i in (select table_name from all_tables
where table_name in ('TABLE_1', 'TABLE_2', ...)
order by table_name);
loop
l_sql := 'delete from ' || i.table_name ||
' where id in (select id from purge_ids) ' ||
' and rownum <= 1000000';
loop
commit;
execute immediate l_sql;
exit when sql%rowcount <> 1000000; -- if we delete less than 1,000,000
end loop; -- no more rows need to be deleted!
end loop;
commit;
end;
Store all the to be deleted ID's into a table. Then there are 3 ways.
1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.
2) Use Oracle Bulk Delete
3) Use correlated delete query.
Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.
First, disabling the index during the deletion would be helpful.
Try with a MERGE INTO statement :
1) create a temp table with IDs and an additional column from TABLE1 and test with the following
MERGE INTO table1 src
USING (SELECT id,col1
FROM test_merge_delete) tgt
ON (src.id = tgt.id)
WHEN MATCHED THEN
UPDATE
SET src.col1 = tgt.col1
DELETE
WHERE src.id = tgt.id
I have tried this code and It's working fine in my case.
DELETE FROM NG_USR_0_CLIENT_GRID_NEW WHERE rowid IN
( SELECT rowid FROM
(
SELECT wi_name, relationship, ROW_NUMBER() OVER (ORDER BY rowid DESC) RN
FROM NG_USR_0_CLIENT_GRID_NEW
WHERE wi_name = 'NB-0000001385-Process'
)
WHERE RN=2
);

bulk collect in oracle

How to query bulk collection? If for example I have
select name
bulk collect into namesValues
from table1
where namesValues is dbms_sql.varchar2_table.
Now, I have another table XYZ which contains
name is_valid
v
h
I want to update is_valid to 'Y' if name is in table1 else 'N'. Table1 has 10 million rows. After bulk collecting I want to execute
update xyz
set is_valid ='Y'
where name in namesValue.
How to query namesValue? Or is there is another option. Table1 has no index.
please help.
As Tom Kyte (Oracle Corp. Vice President) says:
My mantra, that I'll be sticking with thank you very much, is:
You should do it in a single SQL statement if at all possible.
If you cannot do it in a single SQL Statement, then do it in PL/SQL.
If you cannot do it in PL/SQL, try a Java Stored Procedure.
If you cannot do it in Java, do it in a C external procedure.
If you cannot do it in a C external routine, you might want to
seriously think about why it is you need to do it…
think in sets...
learn all there is to learn about SQL...
You should perform your update in SQL if you can. If you need to add an index to do this then that might be preferable to looping through a collection populated with BULK COLLECT.
If however, this is some sort of assignment....
You should specify it as such but here's how you would do it.
I have assumed that your DB server does not have the capacity to hold 10 million records in memory so rather than BULK COLLECTing all 10 million records in one go I have put the BULK COLLECT into a loop to reduce your memory overheads. If this is not the case then you can omit the bulk collect loop.
DECLARE
c_bulk_limit CONSTANT PLS_INTEGER := 500000;
--
CURSOR names_cur
IS
SELECT name
FROM table1;
--
TYPE namesValuesType IS TABLE OF table1.name%TYPE
INDEX BY PLS_INTEGER;
namesValues namesValuesType;
BEGIN
-- Populate the collection
OPEN name_cur;
LOOP
-- Fetch the records in a loop limiting them
-- to the c_bulk_limit amount at a time
FETCH name_cur BULK COLLECT INTO namesValues
LIMIT c_bulk_limit;
-- Process the records in your collection
FORALL x IN INDICES OF namesValues
UPDATE xyz
SET is_valid ='Y'
WHERE name = namesValue(x)
AND is_valid != 'Y';
-- Set up loop exit criteria
EXIT WHEN namesValues.COUNT < c_bulk_limit;
END LOOP;
CLOSE name_cur;
-- You want to update all remaining rows to 'N'
UPDATE xyz
SET is_valid ='N'
WHERE is_valid IS NULL;
EXCEPTION
WHEN others
THEN
IF name_cur%ISOPEN
THEN
CLOSE name_cur;
END IF;
-- Re-raise the exception;
RAISE;
END;
/
Depending upon your rollback segment sizes etc. you may want to issue interim commits within the bulk collect loop but be aware that you will not then be able to rollback these changes. I deliberately haven't added any COMMITs to this so you can choose where to put them to suit your system.
You also might want to change the size of the c_bulk_limit constant depending upon the resources available to you.
Your update will still cause you problems if the xyz table is large and there is no index on the name column.
Hope it helps...
"Table1 has no index."
Well there's your problem right there. Why not? Put an index on TABLE1.NAME and use a normal SQL UPDATE to amend the data in XYZ.
Trying to solve this problem with bulk collect is not the proper approach.

Oracle pl sql 10g - move set of rows from a table to a history table with same structure

PL SQL moves older versions of data from a transaction table to a history table of same structure and archive for a certain period -
for each record
insert into tab_hist (select older_versions of current row);
delete from tab (select older_versions of current row);
END
ps: earlier we were not archiving(no insert) - but after adding the insert it has doubled the run time - so can we accomplish insert and delete with a single select statement? as there is large data to be processed and across multiple table
This is a batch operation, right? In which case you should avoid Row By Row and use set processing. SQL is all about The Joy Of Sets.
Oracle has fantastic bulk SQL processing capabilities. The pseudo code you paosted would look something like this:
declare
cursor c_oldrecs is
select * from your_table
where criterion between some_date and some_other_date;
type rec_nt is table of your_table%rowtype;
oldrecs_coll rec_nt;
begin
open c_oldrecs;
loop
fetch c_oldrecs into oldrecs_coll limit 1000;
exit when oldrecs_coll.count() = 0;
forall i in oldrecs_coll.first() oldrecs_coll.last()
insert into your_table_hist
values oldrecs_coll(i);
forall i in oldrecs_coll.first() oldrecs_coll.last()
delete from your_table
where pk_col = oldrecs_coll(i).pk_col;
end loop;
end;
/
This bulk processing is faster because it sends one thousand statements to the database at a time, instead of switching between PL/SQL and SQL one thousand times. The LIMIT 1000 clause is there to prevent a really huge selection blowing the PGA. This safeguard may not be necessary in your case, or perhaps you can work with a higher value.
I think your current implementation is wrong. It is better to keep only the current version in the live table, and to keep all the historical versions in a separate table from the off. Use triggers to maintain the history as part of every transaction.
It may be that the slowness you are seeing is due to the logic that selects which rows are to be moved. If so, you might get better results by doing the select once to get the rowids into a nested table in memory, then doing the insert and the delete based on that list; or alternatively, driving your loop with a query that selects the rows to be moved.
You might instead consider creating a trigger on insert that will move the existing rows that "match" the row being inserted. This will slow down the inserts somewhat, but would mean you don't need any process to move the old rows in bulk.
If you are on Enterprise edition with the partitioning option, look at partition exchange.
As simple as this
CREATE BACKUP_TAB AS SELECT * FROM TAB
If you are deleting a lot of rows you will be hitting your undo tablespace and a delete which deletes say 100k rows can cause performance issues. You are better of deleting by batch say 5k rows at a time and committing.
BEGIN
-- Where condition on insert and delete must be the same
loop
INSERT INTO BACKUP_TAB SELECT * FROM TAB WHERE 1=1 and rownum < 5000; --Your condition here
exit when SQL%rowcount < 4999;
commit;
end loop;
loop
DELETE FROM TAB
where 1=1--Your condition here
and rownum < 5000;
exit when SQL%rowcount < 4999;
commit;
end loop;
commit;
END;

Resources