Optimal way to DELETE specified rows from Oracle - oracle

I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.
My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:
7384 separate DELETE statements took 203 seconds:
delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...
7384 separate SELECT statements took 57 seconds:
select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...
7384 separate DELETE from (SELECT) statements took 214 seconds:
delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...
1 SELECT statement that has 7384 OR clauses in the where took 127.4s:
select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...
1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:
delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)
While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.
So my questions are:
Is there a better way to delete?
Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.

In advance of my questions being answered, this is how I'd go about it:
Minimize the number of statements and the work they do issued in relative terms.
All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.
Consider Using CREATE TABLE AS SELECT for really large deletes
If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.
Don't run DELETE statements you know won't delete any rows
If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.
CREATE TABLE TABLE1_PURGE NOLOGGING
AS
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;
DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);
DROP TABLE TABLE1_PURGE;
and repeat.
Manage Concurrency if you have to
Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.
declare
l_sql varchar2(4000);
begin
for i in (select table_name from all_tables
where table_name in ('TABLE_1', 'TABLE_2', ...)
order by table_name);
loop
l_sql := 'delete from ' || i.table_name ||
' where id in (select id from purge_ids) ' ||
' and rownum <= 1000000';
loop
commit;
execute immediate l_sql;
exit when sql%rowcount <> 1000000; -- if we delete less than 1,000,000
end loop; -- no more rows need to be deleted!
end loop;
commit;
end;

Store all the to be deleted ID's into a table. Then there are 3 ways.
1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.
2) Use Oracle Bulk Delete
3) Use correlated delete query.
Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.

First, disabling the index during the deletion would be helpful.
Try with a MERGE INTO statement :
1) create a temp table with IDs and an additional column from TABLE1 and test with the following
MERGE INTO table1 src
USING (SELECT id,col1
FROM test_merge_delete) tgt
ON (src.id = tgt.id)
WHEN MATCHED THEN
UPDATE
SET src.col1 = tgt.col1
DELETE
WHERE src.id = tgt.id

I have tried this code and It's working fine in my case.
DELETE FROM NG_USR_0_CLIENT_GRID_NEW WHERE rowid IN
( SELECT rowid FROM
(
SELECT wi_name, relationship, ROW_NUMBER() OVER (ORDER BY rowid DESC) RN
FROM NG_USR_0_CLIENT_GRID_NEW
WHERE wi_name = 'NB-0000001385-Process'
)
WHERE RN=2
);

Related

Optimizing SQL by using temporary table in Oracle

I have a data cleanup-er procedure which deletes the same data from the card rows of two tables.
Both of these update statement use the same subQuery for detecting which rows should be updated.
UPDATE table_1 SET card = NULL WHERE id in
(select id from sub_table WHERE /* complex clause here */);
UPDATE table_2 SET card = NULL WHERE id in
(select id from sub_table WHERE /* complex clause here */);
Is using Oracle Temporary table good solution for optimizing my code?
CREATE TEMPORARY TABLE tmp_sub_table AS
select id from sub_table WHERE /* complex clause here */;
UPDATE table_1 SET card = NULL WHERE id in (select * from tmp_sub_table);
UPDATE table_2 SET card = NULL WHERE id in (select * from tmp_sub_table);
Should I use Local temporary table or Global Temporary table?
Global Temporary Tables are persistent data structures. When we INSERT the data is written to disk, when we SELECT the data is read from disk. So that's quite a lot of Disk I/O: the cost saving from running the same query twice must be greater than the cost of all those writes and reads.
One thing to watch out for is that GTTs are built on a Temporary Tablespace, so you might get contention with other long running processes which are doing sorts, etc. It's a good idea to have a separate Temporary Tablespace, just for GTTs but not many DBAs do this.
An alternative solution would be to use a collection to store subsets of the records in memory and use bulk processing.
declare
l_ids sys.ocinumberlist;
cursor l_cur is
select id from sub_table WHERE /* complex clause here */
order by id
;
begin
open lcur;
loop
fetch lcur bulk collect into l_ids limit 5000;
exit when l_ids.count() = 0;
update table1
set card=null
where id member of l_ids;
update table2
set card=null
where id member of l_ids;
end loop;
end;
"updating many rows with one update statement ... works much faster than updating separately using Looping over cursor"
That is the normal advice. But this is a bulk operation: it is updating five thousand rows at a time, so it's faster than row-by-row. The size of the batch is governed by the BULK COLLECT ... LIMIT clause: you don't want to make the value too high because the collection is in session memory but as you're only select one column - and a number - maybe you can make it higher.
As always tuning is a matter of benchmarking. Have you established that running this sub-query twice is a high-cost operation?
select id from sub_table WHERE /* complex clause here */
If it seems too slow you need to test other approaches and see whether they're faster. Maybe a Global Temporary Table is faster than a bulk operation. Generally memory access is faster than disk access, but you need to see which works best for you.

Can not improve bulk delete

I am using Java with mybatis.
I have a query like this and I need to execute this for 2000 values on key_b. That means I need to run the sql for 2000 times. Which is reasonably slow.
DELETE FROM my_table
WHERE key_a = xxx
AND key_b = yyy
Now I came up with another solution, this time I am sending 1000 values in IN clause for key_b. Which means only two query I am executing. I was expecting this one to be faster at least. But this seems to be even slower than the above one. Here is the sql.
DELETE FROM my_table
WHERE key_a = xxxx
AND key_b IN (y1, y2, ... y1000)
For more information, the key_b is the Primary Key. And the key_a is a Foreign key and has an Index.
Another thing, I've tried to take out the session and make a commit after all the sqls are executed. But It didn't improve that much.
you can use temp table for this:
I mean if you have a table which has id column.
And then You can insert your values to that table like this:
insert into temp_table
select 1 from dual -- your ids
union all
select 2 from dual
union all
select 3 from dual
union all
......
after you fill your temp_table you can run just this:
DELETE FROM my_table
WHERE key_a = xxxx
AND key_b IN
(
select id from temp_table
);
I recommend sticking with 1st approach: called prepared Delete statement in a Java loop over id collection. Off course with ExecutorType REUSE or BATCH, so that statement is prepared once and run for every record.
Furthermore, I discourage trying to bind thousands of parameters.
Anyway, I fear this is the best you can do since Delete operation will check integrity constraints, probably update index, for every record. That is not "bulked".

A fast query that selects the number of rows in each table

I want a query that selects the number of rows in each table
but they are NOT updated statistically .So such query will not be accurate:
select table_name, num_rows from user_tables
i want to select several schema and each schema has minimum 500 table some of them contain a lot of columns . it will took for me days if i want to update them .
from the site ask tom he suggest a function includes this query
'select count(*)
from ' || p_tname INTO l_columnValue;
such query with count(*) is really slow and it will not give me fast results.
Is there a query that can give me how many rows are in table in a fast way ?
You said in a comment that you want to delete (drop?) empty tables. If you don't want an exact count but only want to know if a table is empty you can do a shortcut count:
select count(*) from table_name where rownum < 2;
The optimiser will stop when it reaches the first row - the execution plan shows a 'count stopkey' operation - so it will be fast. It will return zero for an empty table, and one for a table with any data - you have no idea how much data, but you don't seem to care.
You still have a slight race condition between the count and the drop, of course.
This seems like a very odd thing to want to do - either your application uses the table, in which case dropping it will break something even if it's empty; or it doesn't, in which case it shouldn't matter whether it has (presumably redundant) and it can be dropped regardless. If you think there might be confusion, that sounds like your source (including DDL) control needs some work, maybe?
To check if either table in two schemas have a row, just count from both of them; either with a union:
select max(c) from (
select count(*) as c from schema1.table_name where rownum < 2
union all
select count(*) as c from schema2.table_name where rownum < 2
);
... or with greatest and two sub-selects, e.g.:
select greatest(
(select count(*) from schema1.table_name where rownum < 2),
(select count(*) from schema2.table_name where rownum < 2)
) from dual;
Either would return one if either table has any rows, and would only return zero f they were both empty.
Full Disclosure: I had originally suggested a query that specifically counts a column that's (a) indexed and (b) not null. #AlexPoole and #JustinCave pointed out (please see their comments below) that Oracle will optimize a COUNT(*) to do this anyway. As such, this answer has been altered significantly.
There's a good explanation here for why User_Tables shouldn't be used for accurate row counts, even when statistics are up to date.
If your tables have indexes which can be used to speed up the count by doing an index scan rather than a table scan, Oracle will use them. This will make the counts faster, though not by any means instantaneous. That said, this is the only way I know to get an accurate count.
To check for empty (zero row) tables, please use the answer posted by Alex Poole.
You could make a table to hold the counts of each table. Then, set a trigger to run on INSERT for each of the tables you're counting that updates the main table.
You'd also need to include a trigger for DELETE.

Oracle pl sql 10g - move set of rows from a table to a history table with same structure

PL SQL moves older versions of data from a transaction table to a history table of same structure and archive for a certain period -
for each record
insert into tab_hist (select older_versions of current row);
delete from tab (select older_versions of current row);
END
ps: earlier we were not archiving(no insert) - but after adding the insert it has doubled the run time - so can we accomplish insert and delete with a single select statement? as there is large data to be processed and across multiple table
This is a batch operation, right? In which case you should avoid Row By Row and use set processing. SQL is all about The Joy Of Sets.
Oracle has fantastic bulk SQL processing capabilities. The pseudo code you paosted would look something like this:
declare
cursor c_oldrecs is
select * from your_table
where criterion between some_date and some_other_date;
type rec_nt is table of your_table%rowtype;
oldrecs_coll rec_nt;
begin
open c_oldrecs;
loop
fetch c_oldrecs into oldrecs_coll limit 1000;
exit when oldrecs_coll.count() = 0;
forall i in oldrecs_coll.first() oldrecs_coll.last()
insert into your_table_hist
values oldrecs_coll(i);
forall i in oldrecs_coll.first() oldrecs_coll.last()
delete from your_table
where pk_col = oldrecs_coll(i).pk_col;
end loop;
end;
/
This bulk processing is faster because it sends one thousand statements to the database at a time, instead of switching between PL/SQL and SQL one thousand times. The LIMIT 1000 clause is there to prevent a really huge selection blowing the PGA. This safeguard may not be necessary in your case, or perhaps you can work with a higher value.
I think your current implementation is wrong. It is better to keep only the current version in the live table, and to keep all the historical versions in a separate table from the off. Use triggers to maintain the history as part of every transaction.
It may be that the slowness you are seeing is due to the logic that selects which rows are to be moved. If so, you might get better results by doing the select once to get the rowids into a nested table in memory, then doing the insert and the delete based on that list; or alternatively, driving your loop with a query that selects the rows to be moved.
You might instead consider creating a trigger on insert that will move the existing rows that "match" the row being inserted. This will slow down the inserts somewhat, but would mean you don't need any process to move the old rows in bulk.
If you are on Enterprise edition with the partitioning option, look at partition exchange.
As simple as this
CREATE BACKUP_TAB AS SELECT * FROM TAB
If you are deleting a lot of rows you will be hitting your undo tablespace and a delete which deletes say 100k rows can cause performance issues. You are better of deleting by batch say 5k rows at a time and committing.
BEGIN
-- Where condition on insert and delete must be the same
loop
INSERT INTO BACKUP_TAB SELECT * FROM TAB WHERE 1=1 and rownum < 5000; --Your condition here
exit when SQL%rowcount < 4999;
commit;
end loop;
loop
DELETE FROM TAB
where 1=1--Your condition here
and rownum < 5000;
exit when SQL%rowcount < 4999;
commit;
end loop;
commit;
END;

Oracle command hangs when using view for "WHERE x IN..." subquery

I'm working on a web service that fetches data from an oracle data source in chunks and passes it back to an indexing/search tool in XML format. I'm the C#/.NET guy, and am kind of fuzzy on parts of Oracle.
Our Oracle team gave us the following script to run, and it works well:
SELECT ROWID, [columns]
FROM [table]
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT ROWID
FROM [table]
WHERE ROWID > '[previous_batch_last_rowid]'
ORDER BY ROWID
)
WHERE ROWNUM <= 10000
)
ORDER BY ROWID
10,000 rows is an arbitrary but reasonable chunk size and ROWID is sufficiently unique for our purposes to use as a UID since each indexing run hits only one table at a time. Bracketed values are filled in programmatically by the web service.
Now we're going to start adding views to the indexing, each of which will union a few separate tables. Since ROWID would no longer function as a unique identifier, they added a column to the views (VIEW_UNIQUE_ID) that concatenates the ROWIDs from the component tables to construct a UID for each union.
But this script does not work, even though it follows the same form as the previous one:
SELECT VIEW_UNIQUE_ID, [columns]
FROM [view]
WHERE VIEW_UNIQUE_ID IN (
SELECT VIEW_UNIQUE_ID
FROM (
SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID
)
WHERE ROWNUM <= 10000
)
ORDER BY VIEW_UNIQUE_ID
It hangs indefinitely with no response from the Oracle server. I've waited 20+ minutes and the SQLTools dialog box indicating a running query remains the same, with no progress or updates.
I've tested each subquery independently and each works fine and takes a very short amount of time (<= 1 second), so the view itself is sound. But as soon as the inner two SELECT queries are added with "WHERE VIEW_UNIQUE_ID IN...", it hangs.
Why doesn't this query work for views? In what important way are they not interchangeable here?
Updated: the architecture of the solution stipulates that it is to be stateless, so I shouldn't try to make the web service preserve any index state information between requests from consumers.
they added a column to the views
(VIEW_UNIQUE_ID) that concatenates the
ROWIDs from the component tables to
construct a UID for each union.
God, that is the most obscene idea I've seen in a long time.
Let's say the view is a simple one like
SELECT C.CUST_ID, C.CUST_NAME, O.ORDER_ID, C.ROWID||':'||O.ROWID VIEW_UNIQUE_ID
FROM CUSTOMER C JOIN ORDER O ON C.CUST_ID = O.CUST_ID
Every time you want to do the
SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID
It has to build that entire result set, apply the filter, and order it. For anything other than trivially sized tables, that will be a nightmare.
Stop using the database to paginate/chunk the data here and do that in the client. Open the database connection, execute the query, fetch the first ten thousand rows from the query, index them, fetch the next ten thousand. Don't close and reopen the query each time, only after you've processed each row. You'll be able to forget about ordering.
For stateless, you need to re-architect. The whole thing with concatenated ROWIDs will not fly.
Start by putting the records to be processed into a fresh table, then you can flag them/process them/delete them in chunks.
INSERT INTO pending_table
SELECT 'N' state_flag, v.* FROM view v;
<start looping here>
UPDATE pending_table
SET state_flag = 'P'
WHERE ROWNUM < 10000;
COMMIT;
SELECT * FROM pending_table
WHERE state_flag = 'P';
<client processing>
DELETE FROM pending_table
WHERE state_flag = 'P';
<go back to start of loop, and keep going until pending_table is empty>

Resources