Deletes Slow on a Oracle BIG Table - oracle

I have a table which has around 180 million records and 40 indexes. A nightly program, loads data into this table but due to certain business conditions we can only delete and load data into this table. The nightly program will bring new records or updates to existing records in the table from the source system.We have limited window i.e about 6 hours to complete the extract from the source system, perform business transformations and finally load the data into this target table and be ready for users to consume the data in the morning. The issue which we are facing is that the delete from this table takes a lot of time mainly due to the 40 indexes on the table(an average of 70000 deletes per hour). I did some digging on the internet and see the below options
a) Drop or disable indexes before delete and then rebuild indexes: The program which loads data into the target table after delete and loading the data needs to perform quite a few updates for which the indexes are critical. And to rebuild 1 index it takes almost 1.5 hours due to the enormous amount of data in the table. So this approach is not feasible due to the time it takes to rebuild indexes and due to the limited time we have to get the data ready for the users
b) Use bulk delete: Currently the program deletes based on rowid and deletes records one by one as below
DELETE
FROM <table>
WHERE rowid = g_wpk_tab(ln_i);
g_wpk_tab is the collection which holds rowids to be deleted which is read by looping via FOR ALL and I do an intermediate commit every 50000 row deletes.
Tom of AskTom says in this discussion over here says that the bulk delete and row by row delete will take almost the same amount of time
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5033906925164
So this wont be a feasible option as well
c)Regular Delete: Tom of AskTom suggests to use the regular delete and even that takes a long time probably due to the number of indexes on this table
d)CTAS: This approach is out of question because the program needs to recreate the table , create the 40 indexes and then proceed with the updates and I mentioned above an index will take atleast 1.5 hrs to create
If you could provide me any other suggestions I would really appreciate it.
UPDATE: As of now we have decided to go with the approach suggested by https://stackoverflow.com/users/409172/jonearles to archive instead of delete. Approach is to add a flag to the table to mark the records to be deleted as DELETE and then have a post delete program run during the day to delete off the records. This will ensure that the data is available for users at the right time. Since users consume via OBIEE we are planning to set content level filter on the table to not look at the archival column so that users needn't know about what to select and what to ignore.

Parallel DML alter session enable parallel dml;, delete /*+ parallel */ ...;, commit;. Sometimes it's that easy.
Parallel DDL alter index your_index rebuild nologging compress parallel;. NOLOGGING to reduce the amount of redo generated during the index rebuild. COMPRESS can significantly reduce the size of a non-unique index, which significantly reduces the rebuild time. PARALLEL can also make a huge difference in rebuild time if you have more than one CPU or more than one disk. If you're not already using these options, I wouldn't be surprised if using all of them together improves index rebuilds by an order of magnitude. And then 1.5 * 40 / 10 = 6 hours.
Re-evaluate your indexes Do you really need 40 indexes? It's entirely possible, but many indexes are only created because "indexes are magic". Make sure there's a legitimate reason behind each index. This can be very difficult to do, very few people document the reason for an index. Before you ask around, you may want to gather some information. Turn on index monitoring to see which indexes are really being used. And even if the index is used, see how it is used, perhaps through v$sql_plan. It's possible that an index is used for a specific statement but another index would have worked just as well.
Archive instead of delete Instead of deleting, just set a flag to mark a row as archived, invalid, deleted, etc. This will avoid the immediate overhead of index maintenance. Ignore the rows temporarily and let some other job delete them later. The large downside to this is that it affects any query on the table.
Upgrading is probably out of the question, but 12c has an interesting new feature called in-database archiving. It's a more transparent way of accomplishing the same thing.

Related

Data cleanup in Oracle DB is taking long time for 300 billion records

Problem statement:
There is address table in Oracle which is having relationship with multiple tables like subscriber, member etc.
Currently design is in such a way that when there is any change in associated tables, it increments record version throughout all tables.
So new record is added in address table even if same address is already present, resulting into large number of duplicate copies.
We need to identify and remove duplicate records, and update foreign keys in associated tables while making sure it doesn't impact the running application.
Tried solution:
We have written a script for cleanup logic, where unique hash is generated for every address. If calculated hash is already present then it means address is duplicate, where we merge into single address record and update foreign keys in associated tables.
But the problem is there are around 300 billion records in address table, so this cleanup process is taking lot of time, and it will take several days to complete.
We have tried to have index for hash column, but process is still taking time.
Also we have updated the insertion/query logic to use addresses as per new structure (using hash, and without version), in order to take care of incoming requests in production.
We are planning to do processing in chunks, but it will be very long an on-going activity.
Questions:
Would like to if any further improvement can be made in above approach
Will distributed processing will help here? (may be using Hadoop Spark/hive/MR etc.)
Is there any some sort of tool that can be used here?
Suggestion 1
Use built-in delete parallel
delete /*+ parallel(t 8) */ mytable t where ...
Suggestion 2
Use distributed processing (Hadoop Spark/hive) - watch out for potential contention on indexes or table blocks. It is recommended to have each process to work on a logical isolated subset, e.g.
process 1 - delete mytable t where id between 1000 and 1999
process 2 - delete mytable t where id between 2000 and 2999
...
Suggestion 3
If more than ~30% of the table need to be deleted - the fastest way would be to create an empty table, copy there all required rows, drop original table, rename new, create all indexes+constraints. Of course it requires downtime and it greatly depends on number of indexes - the more you have the longer it will take
P.S. There are no "magic" tools to do it. In the end they all run the same sql commands as you can.
It's possible use oracle merge instruction to insert data if you use clean sql.

Is there a way to make selecting query faster?

I want to select multiple rows from multiple tables, one of them having billions of rows. It sometimes take 20 seconds and there are over thousands of users using it so it is pretty bad.
I looked into COLUMNSTORE and tried it in my local machine and the performance is x50 faster than usual! (note that I was clearing the cache to see the difference)
However, the downside is I can't update, insert and delete rows, which is being constantly done for that table with the billion rows.
Is there a way to optimize it? (Besides the (NOLOCK) dirty read, which security is not an issue btw)
There are already indexes in that table, but doesn't help.
Is there a way to perform BATCH EXECUTION (I see it does row execution)? Or any optimization advice?
Using Microsoft SQL Server 2012
When you get to the scale of billions of rows, you often need to take different approaches for handling the data. Separating the content into multiple databases and storing on different machines might be more effective, however the design is considerably more complex.
An alternative is to consider using a combination of partitioned tables with a column-based index. That way at least, you can stage the updated data for the partition and then swap the updated one for the existing one to perform updates. See: http://technet.microsoft.com/en-us/library/gg492088.aspx#Update
An alternative is to consider using three tables: one that is static -- and is perhaps using column-based storage -- the other one dynamic, holding only recent updates and inserts, and the third holding just a list of deleted rows identified by the primary key. You then have to use a view to reconcile the content for queries.

Oracle Transaction - Count table

I have a table where I need to constrain by category and then find all overlapping dates against some date range. This takes about 2 seconds, which is unacceptable to do on every transaction which occurs at roughly 50/s. The alternative is to create some tally table -- then again, I don't know how great of an idea this is because things can get out of sync.
Date on Rent # Rented Category
9/5/2011 5 CATEGORY1
In Oracle (PL/SQL, if it matters), how can I maintain this performance, but ensure that concurrent transactions don't screw up the increment / decrement by making it one less or one more than it really is?
I have two types of transactions, kind of like a search and a rent. Only rents will be updating this tally table (and searches just reading from it). I don't mind if rents slow down, but do not want search performance impacted. Rents can occur as frequently as 5-10 / second.
Oracle uses locks to ensure data consistency during a query. The details of how they work can get complex, but the effect is that it guarantees that an update/insert to your tally table will only use the data in the main table as it was when the query began. If there is another update or insert to your main table while you are doing an update/insert to the tally table, it won't affect it.
You'll have to experiment with your data to see if keeping a summary/tally table helps or hurts you. It really depends on how quickly the main table is getting updated, how much time you spend updating your tally table vs how much time you save by being able to select on it, and how up-to-date you need selects to be.

Strategy to improve Oracle DELETE performance

We've got an Oracle 11g installation that is starting to get big. This database is the backend to a parallel optimization system running on a cluster. Input to the process is contained in the database along with output from the optimization steps. The input includes rote configuration data and some binary files (using 11g's SecureFiles). The output includes 1D, 2D, 3D, and 4D data currently stored in the DB.
DB Structure:
/* Metadata tables */
Case(CaseId, DeleteFlag, ...) On Delete Cascade CaseId
OptimizationRun(OptId, CaseId, ...) On Delete Cascade OptId
OptimizationStep(StepId, OptId, ...) On Delete Cascade StepId
/* Data tables */
Files(FileId, CaseId, Blob) /* deletes are near instantateous here */
/* Data per run */
OnedDataX(OptId, ...)
TwoDDataY1(OptId, ...) /* packed representation of a 1D slice */
/* Data not only per run, but per step */
TwoDDataY2(StepId, ...) /* packed representation of a 1D slice */
ThreeDDataZ(StepId, ...) /* packed representation of a 2D slice */
FourDDataZ(StepId, ...) /* packed representation of a 3D slice */
/* ... About 10 or so of these tables exist */
A reaper script comes around daily and looks for cases with the DeleteFlag = 1 and proceeds with the DELETE FROM Case WHERE DeleteFlag = 1, allowing the cascades to continue.
This strategy works great for read/write, but is now outstripping our capabilities when we want to purge data! The rub is deleting a Case takes ~20-40 minutes depending on the size and often overloads our archiver space. The next major version of the product will take a "from the ground up" approach to solving the problem. The next minor release needs to stay within the confines of data stored in the database.
So, for the minor release we need an approach that can improve delete performance and at most require moderate changes to the database.
REF Partitioning, but the question is HOW? I would love to do INTERVAL on Case and REF on the rest, but that isn't supported. Is there some way to manually partition OptimizationRun by CaseId through a trigger?
Disable archiving/redo logs for deletes? Couldn't find a HINT to go with this one. Not sure it is even feasible.
Truncate? This likely would need some sorta complicated table setup. But maybe I'm not considering all of my option. (per answer, stricken)
To help illustrate the issue, the data in question per case ranges from 15MiB to 1.5GiB with anywhere from 20k to 2M rows.
Update: Current size of the DB is ~1.5TB.
Deleting data is a hell of a job, for the database. It has to create before images, update indexes, write redo logs and remove the data. This is a slow process. If you can have a window to perform this task, easiest and fastest is to build new tables, containing the wanted data. Drop the old tables and rename the new tables.
This requires some setup work, that is obvious but is very well possible to make.
One step less drastic is to drop the indexes before the delete takes place. My vote would go for CTAS (Create Table As Select from) and build the new tables.
A nice partitioning schema would certainly be helpful, maybe in the next release Oracle can combine interval and reference partitioning. It would be very nice to have.
Disabling logging .... can not be done for deletes but CTAS can use nologging. Make a backup when ready and make sure to transfer the datafiles to the standby database, if you have one.
Just some thoughts:
I assume you have indexes on all foreign keys. ON DELETE CASCADE will hold row level locks until the Case delete is complete, and with no indexes will hold table locks I believe and be super slow of course
Do you have any deferred constraints? This would most likely slow things down for Oracle cascading through the various table deletes
Have you tried to do the deletes separately for all affected tables (instead of relying on on delete cascade)? Not as easy, but you may be surprised.
EDIT:
One more thought. You may consider doing a SOFT delete on Case table, meaning you have a status field that will tell your app if that Case should be considered. This flag could have many different values, but maybe 'A' for active and 'I' for inactive. Assuming you are always using Case as a driving/primary table in joins to other tables, you can avoid the HARD deletes all-together (and occasionally do a cleanup off hours on whatever schedule if you like). Apps would need to be aware of this flag of course, and you'd be tied to joining back to Case table. May or may not fit for your situation...
CASCADE DELETE runs internally slow-by-slow, er, row-by-row.
Some options:
Have your purge job snapshot all the cases to be purged into a scratch table with a CTAS. Then have your purge job loop over that table, deleting each case (and its children) individually. This can be unpleasant, especially if you run into millions of descendant rows. We had to change one of the processes recently at [business redacted] which did that to determine which ultimate parents had child counts that would be problematic, and then use a rownum limiter on a delete against the problematic child table(s). It's not fast, but at least it's safer from an undo/redo management perspective by placing an upper bound on how big any transaction can be.
If you're using CASCADE DELETE as a convenience, you could always not do so. You'd have to write a more sophisticated purge routine that deletes from your dependency tree "bottom up".
If you can afford the undo/redo generation on the soft delete, you could range-partition the ultimate parent on DeleteFlag, then partition the children BY REFERENCE, all tables using ENABLE ROW MOVEMENT. You'd incur undo/redo costs for moving the rows when soft-deleted, but when it came time to finally purge, it would be truncating partitions where DeleteFlag = 1, nothing more.
Adding storage is relatively cheap. If there's a date-based retention option, use it, and just have the soft delete option hide the data from the application front end. It's inelegant, but then, so is CASCADE DELETE.
Not advised for live database.
I disabled the foreign key constraints referencing the table which is slow to delete.
I executed the delete
Enabled the foreign keys again.
Use Enterprise Manager to create a AWR report and run it through statspack analyzer which will give you detailed instructions about the bottlenecks in your system. A AWR report is a textfile containing all kinds of data about what the database has done during a certain time and how long it took.... That statspack analyzer ist sort of an automatic DBA telling you what to do.
Forget partitions until Statspack Analyzer tells you that they could be useful and you've got a few idle disks that you can use to distribute the I/O.
Don't think about truncate. It forces a commit...
BTW, I'm not affiliated with Statspack Analyzer, but I think it's a very viable general tuning approach for Oracle, especially if there's no DBA around.

overcoming 'log file sync' by design?

Advice/suggestions needed for a bit of application design.
I have an application which uses 2 tables, one is a staging table, which many separate processes write to, once a 'group' of processes has finished, another job comes along a aggregates the results together into a final table, then deletes that 'group' from the staging table.
The problem that I'm having is that when the staging table is being cleared out, lots of redo is generated and I'm seeing a lot of 'log file sync' waits in the database. This is a shared database with many other applications and this is causing some issues.
When applying the aggregate, the rows are reduced to about 1 row in the final table for every 20 rows in the staging table.
I'm thinking of getting around this by rather than having a single 'staging' table, I will create a table for each 'group'. Once done, this table can just be dropped, which should result in much less redo.
I only have SE, so partitioned tables isn't an option. Also faster disks for the redo probably isn't an option in the short term either.
Is this a bad idea? Any better solutions to be offered?
Thanks.
Would it be possible to solve the problem by having your process do a logical delete (i.e. set a DELETE_FLAG column in the table to 'Y') and then having a nightly process that truncates the table (potentially writing any non-deleted rows to a separate table before the truncate and then copy them back after the table is truncated)?
Are you certain that the source of the log file sync waits is that your disks can't keep up with the I/O? That's certainly possible, of course, but there are other possible causes of excessive log file sync waits including excessive commits. There is an excellent article on tuning log file sync events on the Pythian blog.
The most common cause of excessive log file syncs is too frequent commits, which are often deliberately coded in a mistaken attempt to reduce system load due to locking. You should commit only when your business transaction is complete.
Loading each group into a separate table sounds like a fine plan to reduce redo. You can truncate individual group table following each aggregation.
Another (but I think probably worse) option is to create a new staging table with the groups that haven't been aggregated then drop the original and rename the new table to replace the staging table.
I prefer Justin's suggestion ("logical delete"), but another option to consider might be a partitioned table, if you have the EE licence. The aggregation process could drop a partition instead of deleting the rows.

Resources