I've two tables say STOCK and ITEM. We have a query to delete some records from ITEM table,
delete from ITEM where item_id not in(select itemId from STOCK)
And now I've more than 15,00,000 records to delete, the query was taking much time to do the operation.
When I searched, I found some efficient ways to do this action.
One way:
CREATE TABLE ITEM_TEMP AS
SELECT * FROM ITEM WHERE item_id in(select itemId from STOCK) ;
TRUNCATE TABLE ITEM;
INSERT /+ APPEND +/ INTO ITEM SELECT * FROM ITEM_TEMP;
DROP TABLE ITEM_TEMP;
Secondly instead of truncating just drop the ITEM and then rename the ITEM_TEMP to ITEM. But in this case I've to re create all the indexes.
Can anyone please suggest which one of the above is more efficient, as I could not check this in Production.
I think the correct approach depends on your environment, here.
If you have privileges on the table that must not be affected, or at least must be restored if you drop the table, then the INSERT /*+ APPEND */ may simply be more reliable. Triggers, similarly, or foreign keys, or any objects that will be automatically dropped when the base table is dropped (foreign keys complicate the truncate, of course).
I would usually go for the truncate and insert method based on that. don't worry about the presence on indexes on the table -- a direct path insert is very efficient at building them.
However, if you have a simple table without dependent objects then there's nothing wrong with the drop-and-rename approach.
I also would not rule out just running multiple deletes of a limited number of rows, especially if this is in a production environment.
Best way from used space (and high watermark) and performance is to drop table and then rename ITEM_TEMP table. But, as you mentioned, after that you need to recreate indexes (also grants, triggers, constraints). Also all depending objects will be invalidated.
Some times I try to delete by portions:
begin
loop
delete from ITEM where item_id not in(select itemId from STOCK) and rownum < 10000;
exit when SQL%ROWCOUNT = 0;
commit;
end loop;
end;
Since you have very high number of rows, it better use partition table , may be List partition on "itemId". Then you can easily drop a partition.
Also if your application could run faster. This need design change but it will give benefit in long run.
Related
I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave
I have a big table with lot of data partitioned into multiple partitions. I want to keep a few partitions as they are but delete the rest of the data from the table. I tried searching for a similar question and couldn't find it in stackoverflow. What is the best way to write a query in Oracle to achieve the same?
It is easy to delete data from a specific partition: this statement clears down all the data for February 2012:
delete from t23 partition (feb2012);
A quicker method is to truncate the partition:
alter table t23 truncate partition feb2012;
There are two potential snags here:
Oracle won't let us truncate partitions if we have foreign keys referencing the table.
The operation invalidates any partitioned Indexes so we need to rebuild them afterwards.
Also, it's DDL, so no rollback.
If we never again want to store data for that month we can drop the partition:
alter table t23 drop partition feb2012;
The problem arises when we want to zap multiple partitions and we don't fancy all that typing. We cannot parameterise the partition name, because it's an object name not a variable (no quotes). So leave only dynamic SQL.
As you want to remove most of the data but retain the partition structure truncating the partitions is the best option. Remember to invalidate any integrity constraints (and to reinstate them afterwards).
declare
stmt varchar2(32767);
begin
for lrec in ( select partition_name
from user_tab_partitions
where table_name = 'T23'
and partition_name like '%2012'
)
loop
stmt := 'alter table t23 truncate partition '
|| lrec.partition_name
;
dbms_output.put_line(stmt);
execute immediate stmt;
end loop;
end;
/
You should definitely run the loop first with execute immediate call commented out, so you can see which partitions your WHERE clause is selecting. Obviously you have a back-up and can recover data you didn't mean to remove. But the quickest way to undertake a restore is not to need one.
Afterwards run this query to see which partitions you should rebuild:
select ip.index_name, ip.partition_name, ip.status
from user_indexes i
join user_ind_partitions ip
on ip.index_name = i.index_name
where i.table_name = 'T23'
and ip.status = 'UNUSABLE';
You can automate the rebuild statements in a similar fashion.
" I am thinking of copying the data of partitions I need into a temp
table and truncate the original table and copy back the data from temp
table to original table. "
That's another way of doing things. With exchange partition it might be quite quick. It might also be slower. It also depends on things like foreign keys and indexes, and the ratio of zapped partitions to retained ones. If performance is important and/or you need to undertake this operation regularly then you should to benchmark the various options and see what works best for you.
You must very be careful in drop partition from a partition table. Partition table usually used for big data tables and if (and only if) you have a global index on the table, drop partition make your global index invalid and you should rebuild your global index in a big table, this is disaster.
For minimum side effect for queries on the table in this scenario, I first delete records in the partition and make it empty partition, then with
ALTER TABLE table_name DROP PARTITION partition_name UPDATE GLOBAL INDEXES;
drop empty partition without make my global index invalid.
I have one table in oracle where data gets inserted from some third party. I want to populate master tables from that table. So, what will be the best way performance wise using collection.
E.g. Suppose, the table into which data will get populated from third party is 'EMP_TMP'.
Now I want to populate 'EMPLOYEE' master table through procedure which will get populated from EMP_TMP Table.
Here again there is one condition like IF SAME EMPID (this is not primary key) EXISTS then we have to UPDATE FULL TABLE which consists of SAME EMPID ELSE we have INSERT NEW RECORD.
[Note: Here EMPID is VARCHAR2 and EMPNO will be primary key where we will use SEQUENCE]
I think here merge will not perform much better performancewise since we cant use collection in MERGE statement.
Well, if performance is your primary consideration, and you don't like MERGE, then how about this (run as script, single transaction):
delete from EMPLOYEE where emp_id IN (
select emp_id from EMP_TMP);
insert into EMPLOYEE
select * from EMP_TMP;
commit;
Obviously not the "safest" approach (and as written assumes exact same table definitions and you have the rollback), but should be fast (you could also mess with IN vs EXISTS etc). And I couldn't quite understand your post if emp_id or emp_no was the common key in these 2 tables, but use whichever makes sense in your situation.
Create a procedure, you need to be using PL/SQL.
Do an update first then test sql%rowcount.
If it is 0, no updates where done and you have to do an insert instead.
I think that this is fairly efficient.
pseudo code
Update table;
if sql%rowcount = 0 then
//get new sequence number
insert into table;
END IF;
COMMIT;
HTH
Harv
Let's say I have a Big and a Bigger table.
I need to cycle through the Big table, that is indexed but not sequential (since it is a filter of a sequentially indexed Bigger table).
For this example, let's say I needed to cycle through about 20000 rows.
Should I do 20000 of these
set #currentID = (select min(ID) from myData where ID > #currentID)
or
Creating a (big) temporary sequentially indexed table (copy of the Big table) and do 20000 of
#Row = #Row + 1
?
I imagine that doing 20000 filters of the Bigger table just to fetch the next ID is heavy, but so must be filling a big (Big sized) temporary table just to add a dummy identity column.
Is the solution somewhere else?
For example, if I could loop through the results of the select statement (the filter of the Bigger table that originates "table" (actually a resultset) Big) without needing to create temporary tables, it would be ideal, but I seem to be unable to add something like an IDENTITY(1,1) dummy column to the results.
Thanks!
You may want to consider finding out how to do your work set based instead of RBAR. With that said, for very big tables, you may want to not make a temp table so that you are sure that you have live data if you suspect that the proc may run for a while in production. If your proc fails, you'll be able to pick up where you left off. If you use a temp table then if your proc crashes, then you could lose data that hasn't been completed yet.
You need to provide more information on what your end result is, It is only very rarely necessary to do row-by-row processing (and almost always the worst possible choice from a performance perspective). This article will get you started on how to do many tasks in a set-based manner:
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
If you just want a temp table with an identity, here are two methods:
create table #temp ( test varchar (10) , id int identity)
insert #temp (test)
select test from mytable
select test, identity(int) as id into #temp from mytable
I think a join will serve your purposes better.
SELECT BIG.*, BIGGER.*, -- Add additional calcs here involving BIG and BIGGER.
FROM TableBig BIG (NOLOCK)
JOIN TableBigger BIGGER (NOLOCK)
ON BIG.ID = BIGGER.ID
This will limit the set you are working with to. But again it comes down to the specifics of your solution.
Remember too, you can do bulk inserts and bulk updates in this manner too.
I have a very large table (5mm records). I'm trying to obfuscate the table's VARCHAR2 columns with random alphanumerics for every record on the table. My procedure executes successfully on smaller datasets, but it will eventually be used on a remote db whose settings I can't control, so I'd like to EXECUTE the UPDATE statement in batches to avoid running out of undospace.
Is there some kind of option I can enable, or a standard way to do the update in chunks?
I'll add that there won't be any distinguishing features of the records that haven't been obfuscated so my one thought of using rownum in a loop won't work (I think).
If you are going to update every row in a table, you are better off doing a Create Table As Select, then drop/truncate the original table and re-append with the new data. If you've got the partitioning option, you can create your new table as a table with a single partition and simply swap it with EXCHANGE PARTITION.
Inserts require a LOT less undo and a direct path insert with nologging (/+APPEND/ hint) won't generate much redo either.
With either mechanism, there would probably sill be 'forensic' evidence of the old values (eg preserved in undo or in "available" space allocated to the table due to row movement).
The following is untested, but should work:
declare
l_fetchsize number := 10000;
cursor cur_getrows is
select rowid, random_function(my_column)
from my_table;
type rowid_tbl_type is table of urowid;
type my_column_tbl_type is table of my_table.my_column%type;
rowid_tbl rowid_tbl_type;
my_column_tbl my_column_tbl_type;
begin
open cur_getrows;
loop
fetch cur_getrows bulk collect
into rowid_tbl, my_column_tbl
limit l_fetchsize;
exit when rowid_tbl.count = 0;
forall i in rowid_tbl.first..rowid_tbl.last
update my_table
set my_column = my_column_tbl(i)
where rowid = rowid_tbl(i);
commit;
end loop;
close cur_getrows;
end;
/
This isn't optimally efficient -- a single update would be -- but it'll do smaller, user-tunable batches, using ROWID.
I do this by mapping the primary key to an integer (mod n), and then perform the update for each x, where 0 <= x < n.
For example, maybe you are unlucky and the primary key is a string. You can hash it with your favorite hash function, and break it into three partitions:
UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=0
UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=1
UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=2
You may have more partitions, and may want to put this into a loop (with some commits).
If I had to update millions of records I would probably opt to NOT update.
I would more likely create a temp table and then insert data from old table since insert doesnt take up a lot of redo space and takes less undo.
CREATE TABLE new_table as select <do the update "here"> from old_table;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;
you can do that using parallel query, with nologging on most operations generating very
little redo and no undo at all -- in a fraction of the time it would take to update the
data.