Oracle, Spring JDBC, the fastest way how to delete data from table - oracle

I use JdbcTemplate's batchUpdate to insert big amount of data into db tables
What is the fastest way to delete it?
I use
DELETE FROM tableName WHERE ID >= MIN_ID
where MIN_ID is the startValue of the sequence = 1000000
Note, I want to start from MIN_ID and keep data with IDs below 1000000
Is there better approach?
What best practices should one follow?
Note, I use Oracle db

If you want to delete the entire table since MIN_ID is the startValue, you can use TRUNCATE TABLE as TRUNCATE TABLE tableName;

If you want to delete big part of table it is usually cheaper to create new_table as select * from tableName where nvl(ID,-1) < MIN_ID; and swap names. Don't forget about swaping indexes and recreating constraints and other objects. Of course if someone can insert the table during process of swapping you need either lock table or implement some consistency mechanism that will take care of data inserted during swap.

Related

Delete rows from partition table - Best way

I want to delete around 1 million records from a table which is partitioned and table size is around 10-13 millions , As of now only 2 partition exist in the table containining July month data and august month data, and i want to delete from July month.Can you please let me know if a simple delete from table paritition (0715) is ok to do ? Possibilities of fragmentation ? or any best way out?
Thank you
DELETE is rather costly operation on large partitioned tables (but 10M is not realy large). Typically you try to avoid it and remove the data partition-wise using drop partition.
The simplest schema is rolling window, where you define a range partitioning schema by dropping the oldest partitian after the retention interval.
If you need more controll you may use CTAS and exchange back approach.
Instead of deleting a large part of a partition create a copy of it
create table TMP as
select * from TAB PARTITION (ppp)
where <predicate to filter out records to be ommited for partition ppp>
Create indexes on the TMP table in the same structure as the LOCAL indexes of the partitioned table.
Than exchange the temporary table with the partition
ALTER TABLE TAB
EXCHANGE PARTITION ppp WITH TABLE TMP including indexes
WITHOUT VALIDATION
Note no fragmenatation as a result, in contrary you may use it to reorganize the partition data (e.g. with ORDER BY in CTAS or with COMPRESS etc.)
You can delete truncate the partition from the given table. Delete also you can perform if you want to delete few rows from the partition. Plz share your table structure along with the partition details so that it will be easy for people here to assist you.

Delete records in an efficient way

I've two tables say STOCK and ITEM. We have a query to delete some records from ITEM table,
delete from ITEM where item_id not in(select itemId from STOCK)
And now I've more than 15,00,000 records to delete, the query was taking much time to do the operation.
When I searched, I found some efficient ways to do this action.
One way:
CREATE TABLE ITEM_TEMP AS
SELECT * FROM ITEM WHERE item_id in(select itemId from STOCK) ;
TRUNCATE TABLE ITEM;
INSERT /+ APPEND +/ INTO ITEM SELECT * FROM ITEM_TEMP;
DROP TABLE ITEM_TEMP;
Secondly instead of truncating just drop the ITEM and then rename the ITEM_TEMP to ITEM. But in this case I've to re create all the indexes.
Can anyone please suggest which one of the above is more efficient, as I could not check this in Production.
I think the correct approach depends on your environment, here.
If you have privileges on the table that must not be affected, or at least must be restored if you drop the table, then the INSERT /*+ APPEND */ may simply be more reliable. Triggers, similarly, or foreign keys, or any objects that will be automatically dropped when the base table is dropped (foreign keys complicate the truncate, of course).
I would usually go for the truncate and insert method based on that. don't worry about the presence on indexes on the table -- a direct path insert is very efficient at building them.
However, if you have a simple table without dependent objects then there's nothing wrong with the drop-and-rename approach.
I also would not rule out just running multiple deletes of a limited number of rows, especially if this is in a production environment.
Best way from used space (and high watermark) and performance is to drop table and then rename ITEM_TEMP table. But, as you mentioned, after that you need to recreate indexes (also grants, triggers, constraints). Also all depending objects will be invalidated.
Some times I try to delete by portions:
begin
loop
delete from ITEM where item_id not in(select itemId from STOCK) and rownum < 10000;
exit when SQL%ROWCOUNT = 0;
commit;
end loop;
end;
Since you have very high number of rows, it better use partition table , may be List partition on "itemId". Then you can easily drop a partition.
Also if your application could run faster. This need design change but it will give benefit in long run.

Optimizing a delete... where query with rownum

I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave

Oracle how to delete from a table except few partitions data

I have a big table with lot of data partitioned into multiple partitions. I want to keep a few partitions as they are but delete the rest of the data from the table. I tried searching for a similar question and couldn't find it in stackoverflow. What is the best way to write a query in Oracle to achieve the same?
It is easy to delete data from a specific partition: this statement clears down all the data for February 2012:
delete from t23 partition (feb2012);
A quicker method is to truncate the partition:
alter table t23 truncate partition feb2012;
There are two potential snags here:
Oracle won't let us truncate partitions if we have foreign keys referencing the table.
The operation invalidates any partitioned Indexes so we need to rebuild them afterwards.
Also, it's DDL, so no rollback.
If we never again want to store data for that month we can drop the partition:
alter table t23 drop partition feb2012;
The problem arises when we want to zap multiple partitions and we don't fancy all that typing. We cannot parameterise the partition name, because it's an object name not a variable (no quotes). So leave only dynamic SQL.
As you want to remove most of the data but retain the partition structure truncating the partitions is the best option. Remember to invalidate any integrity constraints (and to reinstate them afterwards).
declare
stmt varchar2(32767);
begin
for lrec in ( select partition_name
from user_tab_partitions
where table_name = 'T23'
and partition_name like '%2012'
)
loop
stmt := 'alter table t23 truncate partition '
|| lrec.partition_name
;
dbms_output.put_line(stmt);
execute immediate stmt;
end loop;
end;
/
You should definitely run the loop first with execute immediate call commented out, so you can see which partitions your WHERE clause is selecting. Obviously you have a back-up and can recover data you didn't mean to remove. But the quickest way to undertake a restore is not to need one.
Afterwards run this query to see which partitions you should rebuild:
select ip.index_name, ip.partition_name, ip.status
from user_indexes i
join user_ind_partitions ip
on ip.index_name = i.index_name
where i.table_name = 'T23'
and ip.status = 'UNUSABLE';
You can automate the rebuild statements in a similar fashion.
" I am thinking of copying the data of partitions I need into a temp
table and truncate the original table and copy back the data from temp
table to original table. "
That's another way of doing things. With exchange partition it might be quite quick. It might also be slower. It also depends on things like foreign keys and indexes, and the ratio of zapped partitions to retained ones. If performance is important and/or you need to undertake this operation regularly then you should to benchmark the various options and see what works best for you.
You must very be careful in drop partition from a partition table. Partition table usually used for big data tables and if (and only if) you have a global index on the table, drop partition make your global index invalid and you should rebuild your global index in a big table, this is disaster.
For minimum side effect for queries on the table in this scenario, I first delete records in the partition and make it empty partition, then with
ALTER TABLE table_name DROP PARTITION partition_name UPDATE GLOBAL INDEXES;
drop empty partition without make my global index invalid.

How do I UPDATE a large table in oracle pl/sql in batches to avoid running out of undospace?

I have a very large table (5mm records). I'm trying to obfuscate the table's VARCHAR2 columns with random alphanumerics for every record on the table. My procedure executes successfully on smaller datasets, but it will eventually be used on a remote db whose settings I can't control, so I'd like to EXECUTE the UPDATE statement in batches to avoid running out of undospace.
Is there some kind of option I can enable, or a standard way to do the update in chunks?
I'll add that there won't be any distinguishing features of the records that haven't been obfuscated so my one thought of using rownum in a loop won't work (I think).
If you are going to update every row in a table, you are better off doing a Create Table As Select, then drop/truncate the original table and re-append with the new data. If you've got the partitioning option, you can create your new table as a table with a single partition and simply swap it with EXCHANGE PARTITION.
Inserts require a LOT less undo and a direct path insert with nologging (/+APPEND/ hint) won't generate much redo either.
With either mechanism, there would probably sill be 'forensic' evidence of the old values (eg preserved in undo or in "available" space allocated to the table due to row movement).
The following is untested, but should work:
declare
l_fetchsize number := 10000;
cursor cur_getrows is
select rowid, random_function(my_column)
from my_table;
type rowid_tbl_type is table of urowid;
type my_column_tbl_type is table of my_table.my_column%type;
rowid_tbl rowid_tbl_type;
my_column_tbl my_column_tbl_type;
begin
open cur_getrows;
loop
fetch cur_getrows bulk collect
into rowid_tbl, my_column_tbl
limit l_fetchsize;
exit when rowid_tbl.count = 0;
forall i in rowid_tbl.first..rowid_tbl.last
update my_table
set my_column = my_column_tbl(i)
where rowid = rowid_tbl(i);
commit;
end loop;
close cur_getrows;
end;
/
This isn't optimally efficient -- a single update would be -- but it'll do smaller, user-tunable batches, using ROWID.
I do this by mapping the primary key to an integer (mod n), and then perform the update for each x, where 0 <= x < n.
For example, maybe you are unlucky and the primary key is a string. You can hash it with your favorite hash function, and break it into three partitions:
UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=0
UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=1
UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=2
You may have more partitions, and may want to put this into a loop (with some commits).
If I had to update millions of records I would probably opt to NOT update.
I would more likely create a temp table and then insert data from old table since insert doesnt take up a lot of redo space and takes less undo.
CREATE TABLE new_table as select <do the update "here"> from old_table;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;
you can do that using parallel query, with nologging on most operations generating very
little redo and no undo at all -- in a fraction of the time it would take to update the
data.

Resources