I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave
Related
We have a primary table that is Range partitioned by date with a 1-month interval. It's also a list sub-partitioned with 4 distinct values. So essentially it is one month partition having 4 sub-partitions.
Database: Oracle 19c
I need advice on how to effectively move the partition/sub-partition data from active schema to historical schema in another database.
Also, there are about 30 tables that are referenced partitioned on the primary table for which the data needs to be moved as well. Overall I'm looking to move about 2500 subpartitions
I'm not sure if an exchange partition would be the right approach in this scenario?
TIA
You could use exchange to get the data rapidly out of your active table, but you would still then to send that table over the wire to the remote history database to load it in.
In which case, using "exchange" probably is just adding more steps to the process for little gain. (There are still potential uses here depending on how you want to handle indexing etc).
But simplest is perhaps just transferring the data over, assuming a common structure between the two tables, ie
insert /*+ APPEND */ into history_table#remote_db
select * from active_table partition ( myparname )
I can't remember if partition naming syntax is supported over a db link, but if not, then the appropriate date predicates will do the same trick, and then just follow up with:
alter table active_table truncate partition myparname;
This is a general question about the Oracle MERGE INTO statement with a particular scenario, on Oracle RDBMS 12c.
Daily data will be loaded to StagingTableA - about 10m rows.
This will be MERGEd INTO TableA.
TableA will vary between 0 to 10m rows (matcing StagingTableA).
There may be times when TableA will be pruned/emptied and left with 0 rows.
Clearly, when TableA is empty, a straight INSERT will do the job, but the procedure has been written to use a MERGE INTO method to handle all scenarios.
The MERGE .. MATCH is on a indexed column.
My question is an uncertainty about how the MERGE handles the MATCH in circumstances where TableA will start empty, and then grow hugely during the MERGE execution. The MATCH on indexed columns will use a FTS as the stats will show the table has 0 rows.
At some point during the MERGE transaction, this will become inefficient.
Is the MERGE statement clever enough to detect this and change the execution plan, and start using the index instead of the FTS?
If this was done the old way with CURSOR, UPDATE and INSERT then we could potentially introduce a ANALYZE at a appropriate point (say after 50,000 processed) on the TableA to switch to a optimal plan.
I haven't been able to find any documentation dealing with this specific question.
Hopefully you've got a UNIQUE index on that table, which is based on the incoming data. If I was you, rather than using a simple MERGE I'd:
Mark all indexes on the table as UNUSABLE, except for the unique index.
INSERT all records
Catch the DUPLICATE VALUE ON INDEX exception at the time of INSERT and issue the appropriate UPDATE.
DELETE processed rows from the input record.
Commit every N records (1000? 10000? 100000? Your choice...), calling DBMS_STATS.GATHER_TABLE_STATS for the table you've inserted into after each COMMIT.
Best of luck.
I've two tables say STOCK and ITEM. We have a query to delete some records from ITEM table,
delete from ITEM where item_id not in(select itemId from STOCK)
And now I've more than 15,00,000 records to delete, the query was taking much time to do the operation.
When I searched, I found some efficient ways to do this action.
One way:
CREATE TABLE ITEM_TEMP AS
SELECT * FROM ITEM WHERE item_id in(select itemId from STOCK) ;
TRUNCATE TABLE ITEM;
INSERT /+ APPEND +/ INTO ITEM SELECT * FROM ITEM_TEMP;
DROP TABLE ITEM_TEMP;
Secondly instead of truncating just drop the ITEM and then rename the ITEM_TEMP to ITEM. But in this case I've to re create all the indexes.
Can anyone please suggest which one of the above is more efficient, as I could not check this in Production.
I think the correct approach depends on your environment, here.
If you have privileges on the table that must not be affected, or at least must be restored if you drop the table, then the INSERT /*+ APPEND */ may simply be more reliable. Triggers, similarly, or foreign keys, or any objects that will be automatically dropped when the base table is dropped (foreign keys complicate the truncate, of course).
I would usually go for the truncate and insert method based on that. don't worry about the presence on indexes on the table -- a direct path insert is very efficient at building them.
However, if you have a simple table without dependent objects then there's nothing wrong with the drop-and-rename approach.
I also would not rule out just running multiple deletes of a limited number of rows, especially if this is in a production environment.
Best way from used space (and high watermark) and performance is to drop table and then rename ITEM_TEMP table. But, as you mentioned, after that you need to recreate indexes (also grants, triggers, constraints). Also all depending objects will be invalidated.
Some times I try to delete by portions:
begin
loop
delete from ITEM where item_id not in(select itemId from STOCK) and rownum < 10000;
exit when SQL%ROWCOUNT = 0;
commit;
end loop;
end;
Since you have very high number of rows, it better use partition table , may be List partition on "itemId". Then you can easily drop a partition.
Also if your application could run faster. This need design change but it will give benefit in long run.
In our one of production database, we have 4 column table and there are no PK,UK constraints on it. only one notnull constraint on one column. The inserts are slow on this table and when I checked the indexes , there is one index which is built on all columns.
It is a normal table and not IOT. I really don't see a need of all column index, but wondering why the developers has created it?
Appreciate your thoughts?
It might be usefull, i.e. if you (mainly) query all columns oracle doesn't have to access the table at all, but can get all the data from the index. Though inserts take longer because a larger index has to be maintained by the dbms everytime.
One case where it could be useful is,
Say for example, you are trying to check the existence of records in this table and for that you have to have joins on all four columns. So in such a case if you have written a correlated query like below,
SELECT <something>
FROM table_1 t1
WHERE EXISTS
(SELECT 1 FROM table_t2 t2 where t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3 and t1.c4=t2.c4)
Apart from above case, it looks an error to me from developer's side.
Indexes are good to better query optimization but causes slow updates/inserts because the indexes needs to be updated at each modification.
If these tables first use is querying and inserts happens only in a specific periods like a batch at the beginning or the end of the day only, then you can remove the indexes before updating tables and then restore them.
In addition, all the queries all these tables need to be analysed to see which indexes are useful and which are not?
Anyway, You need to ask developers before removing these indexes.
I have a cursor that selects all rows in a table, a little over 500,000 rows. Read a row from cursor, INSERT into other table, which has two indexes, neither unique, one numeric, one 'DATE' type. COMMIT. Read next row from Cursor, INSERT...until Cursor is empty.
All my DATE column's values are the same, from a timestamp initialized at the start of the script.
This thing's been running for 24 hours, only posted 464K rows, a little less than 10K rows / hr.
Oracle 11g, 10 processors(!?)
Something has to be wrong. I think it's that DATE index trying to process all these entries with exactly the same value for that column.
Why don't you just do:
insert into target (columns....)
select columns and computed values
from source
commit
?
This slow by slow is doing far more damage to performance than an index that may not make any sense.
Indexes slow down inserts but speed up queries. This is normal.
If it is a problem you can remove the index, insert the rows, then add the index again. This can be faster if you are doing many inserts at once.
The way you are copying the data using cursors seems to be inefficient. You could try a set-based approach instead:
INSERT INTO table1 (x, y, z)
SELECT x, y, z FROM table2 WHERE ...
Committing after every inserted row doesn't make much sense. If you're worried about exceeding undo capacity, for example, you can keep a count of the inserts and issue a commit after every thousand rows.
Updating the indexes will have some impact but that's unavoidable if you can't drop (or disable) while the inserts are performed, but that's just how it goes. I'd expect the commits to have a bigger impact, though I suspect that's a topic with varied opinions.
This assumes you have a good reason for inserting from a cursor rather than as a direct insert into ... select from model.
In general, its often a good idea to delete the indexes before doing a massive insert and then add them back afterwards, so that the db doesnt have to try to update the indexes with each insert. Its been a long while since I've used oracle, but had you tried putting more than one insert statement in a transaction? That should also speed it up.
For operations like this you should look at oracle bulk operations, using FORALL and BULK COLLECT. It will reduce the number of DDL operations on the underlying tables considerably
create or replace procedure fast_proc is
type MyTable is table of source_table%ROWTYPE;
MyTable table;
begin
select * BULK COLLECT INTO table from source_table;
forall x in table.First..table.Last
insert into dest_table values table(x) ;
end;
Agreed on comment that what is killing your time is the 'slow by slow' processing. Copying 500,000 rows should be a matter of minutes.
The single INSERT ... SELECT FROM .... approach would be the best one, provided you have big enough Rollback segments. The database may even automatically apply parallel techniques to a plain SQL statement that it will not do with PL/SQL.
In addition you could look at using the /*+ APPEND */ hint - read up on it and see if it may apply to the situation with your target table.
o use all 10 cores you will need to either use plain parallel SQL, or run 10 copies of your pl/sql block, splitting the source table across the 10 copies.
In Oracle 10 this is a manual task (roll your own parallelism) but Oracle 11.2 introduces DBMS_PARALLEL_EXECUTE.
Failing that, bulking up your fetch / insert using the BULK COLLECT & bulk insert would be the next best option - process in chunks of 1000 or so rows (or larger). Again take a look as to whether DBMS_PARALLEL_EXECUTE may help you, or if you could submit the job in chunks via DBMS_JOB.
(Caveat : I don't have access to anything later than Oracle 10)