Need to drop 3 columns from table having about 3 million rows. It is taking about 3 hours to drop 3 columns. I'm thinking of using CHECKPOINT but not sure if what CHECKPOINT number I need to use. Also is it safe to use CHECKPOINT option?
We are on Oracle 12.2
So far, I've tried this:
ALTER TABLE <table1> SET UNUSED (<col1>, <col2>, <col3>);
ALTER TABLE <table1> DROP UNUSED COLUMNS;
Is your goal to improve performance, or to conserve UNDO space used?
CHECKPOINT will not improve performance. Your command must alter all 3 million rows of data, regardless. CHECKPOINT will limit the amount of UNDO rows that exist at any one time, but it won't limit the total number of UNDO rows that need to be created over the course of the transaction. If anything, checkpointing - which will clear UNDO records and write more to REDO - will introduce even more disk I/O operations to your transaction and slow it down further.
CHECKPOINT is really only useful if you have limited disk capacity for your UNDO tablespace, in which case the number of rows should depend on the amount of UNDO space used per row and the total amount of space you can allow for the transaction. That may take experimentation to determine - start high and work down until the transaction completes - you want was few checkpoints as possible while staying within your UNDO storage threshhold.
Also, per the documentation (https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/ALTER-TABLE.html#GUID-552E7373-BF93-477D-9DA3-B2C9386F2877) anything that goes wrong after a checkpoint has been applied and before the transaction is complete may leave the table in an unstable/unusable state. Therefore it is not entirely safe. Use with caution, and be prepared to restore from backup if things go wrong.
If this is a one off operation, just re-create the table without the columns.
Related
I am very new to oracle and I have below query.
I have one table which has almost 6 L records.
In daily batch i need to delete almost 5.7 L record and insert it again in from another table. Note that i can not use truncate table because 30000 records are constant one which I should not delete.
Issue here is if I delete daily 5.67 L record, it may cause for High WaterMark issue.
So my query is can Gather Stats helps to reduce HWM?
I can do Oracle Gather Stats Daily.
You can use the SHRINK command to recover the space and reset the high water mark:
alter table your_table shrink space;
However, you should only do this if you need to. In your case it seems likely you need only do this if you are inserting your 567,000 records using the /* APPEND */ hint; this hint tells the optimiser to insert records above the HWM, which in your scenario would cause the table to grow, with vast amounts of empty space. Shrinking would definitely be useful here.
If you're just inserting records without the hint then they will mainly reuse the empty space vacated by the prior deletion, so you don't need to concern yourself with the HWM.
Incidentally, deleting and re-inserting 5.67L records every day sounds rather poor practice. There are probably better solutions (such as MERGE), depending on the underlying business rules you're trying to satisfy.
If you reinsert the same amount of data, the table should be roughly the same size. Therefore, I would not worry about the high water mark too much.
Having said that, it is good practice to gather statistics if the data has changed substantially, so I would recommend that.
The nature of my application involves daily deleting and bulk inserting of large datasets into an Oracle 12c database. My tables are interval-partitioned by a date field and partitioned-indexed. I use a stored procedure to gather statistics for the affected partitions after each run. Lately, I found that the runs have been slowing down considerably and was wondering if this was due to the increasing size of the database.
I have searched for how to calculate the total disk space that my tables use and usually arrive at this:
select sum(bytes)/1024/1024/1024
from dba_segments
where owner='SCHEMA' and segment_name in ('TABLE_A', 'TABLE_B');
However, the numbers were huge and do not reflect the actual data volume used. When we exported the tables for restoration to another database, the file was much smaller than that query suggests. I dug deeper and arrived at this query instead:
select partition_name,
blocks*8/1024 size_m,
num_rows*avg_row_len/1024/1024 occ_m,
blocks*8/1024 - num_rows*avg_row_len/1024/1024 wast_m
from dba_tab_partitions
where table_name='TABLE_A';
This query suggests that there is a "wasted" space concept where after performing bulk inserts and deleting the data before it is replaced again, the space used is not reclaimed.
Thus I have the following questions:
Does the "wasted" space contribute to performance degradation when I
perform delete from table where ..?
Is there a difference between
performing a delete from table where .. as compared to dropping
the partitions with regard to "wasted" space?
Is performing table reorganization / defragmentation on a regular basis to reclaim table space a recommended practice?
Does the "wasted" space contribute to performance degradation when I perform delete from table where ..?
Yes, you are deleting from table Oracle has to to perform Full Tabl Scan/Index Range Scan(Index leaf node may lead to empty blocks) on the underlying table up to High Water Mark, which makes your delete slow.
Is there a difference between performing a delete from table where .. as compared to dropping the partitions with regard to "wasted" space?
Deleting is a slow process. It has to create before images(undo), update indexes, write redo logs and remove the data. Since DDL(Drop) doesnt generate redo/undo(Generate tiny bit of undo/redo for meta data) it would be faster than DML(delete).
Is performing table reorganization/defragmentation on a regular basis to reclaim table space a recommended practice?
Objects with fragmented free space can result in much wasted space, and can impact database performance. The preferred way to defragment and reclaim this space is to perform an online segment shrink.
For details:Reclaiming Unused Space
The following blog post demostrate the performance impact during DML becuase of wasted space and how to get rid of it.
Defragmentation Can Degrade Query Performance
If you're doing deletes or updates your space is getting fragmented. You can read about it in documentation.
To improve your process you can either perform some cleaning operations like shrink or just recreate tables on some big inserts. I mean instead of doing delete and insert do create table as select from old where rows not to delete and then insert new set into new table. After that just swap names and drop old table.
With your second question I think answer is here. Dropping partition will reduce HWM and delete will not.
This query suggests that there is a "wasted" space concept where after performing bulk inserts and deleting the data before it is replaced again, the space used is not reclaimed.
This is correct.
A direct path insert uses space above the high water mark for the segment. Subsequent deletes remove rows, but do not reset the high water mark.
It would be best to be able to truncate the segment prior to performing another direct path insert, as this resets the high water mark as well as removing all the rows.
A brand new application with Oracle as DataStore is going to be pushed in Production. The Databases use CBO and I have indentified some columns to do indexing. I am expecting the total number of records in a particular table to be 4 million after 6 months. After that very few records will be added and there will not be any updates in the records of Indexed columns. I mean most of the updates will be on NonIndexed columns.
Is it advisable to create Index now? or I need to wait for a couple of months?
If table requires indexes, you will incur a lot of poor performance (full table scan + actual I/O) after the number of rows in the table goes beyond what might reasonably be kept the cache. Assume that is 20000 rows. We'll call it magic number. You hit 20000 rows in a week of production. After that the queries and updates on the table will grow progressively slower, on average, as more rows are added.
You are probably worried about the overhead of inserting new rows with indexed fields. That is a one-time hit. You a trading that against dozens of queries and updates when you delay adding indexes.
The trade off is largely in favor of adding indexes right now. Especially since we do not know what that magic number (20000?) really is. Could be larger. Or smaller.
Given: SQL Server 2008 R2. Quit some speedin data discs. Log discs lagging.
Required: LOTS LOTS LOTS of inserts. Like 10.000 to 30.000 rows into a simple table with two indices per second. Inserts have an intrinsic order and will not repeat, as such order of inserts must not be maintained in short term (i.e. multiple parallel inserts are ok).
So far: accumulating data into a queue. Regularly (async threadpool) emptying up to 1024 entries into a work item that gets queued. Threadpool (custom class) has 32 possible threads. Opens 32 connections.
Problem: performance is off by a factor of 300.... only about 100 to 150 rows are inserted per second. Log wait time is up to 40% - 45% of processing time (ms per second) in sql server. Server cpu load is low (4% to 5% or so).
Not usable: bulk insert. The data must be written as real time as possible to the disc. THis is pretty much an archivl process of data running through the system, but there are queries which need access to the data regularly. I could try dumping them to disc and using bulk upload 1-2 times per second.... will give this a try.
Anyone a smart idea? My next step is moving the log to a fast disc set (128gb modern ssd) and to see what happens then. The significant performance boost probably will do things quite different. But even then.... the question is whether / what is feasible.
So, please fire on the smart ideas.
Ok, anywering myself. Going to give SqlBulkCopy a try, batching up to 65536 entries and flushing them out every second in an async fashion. Will report on the gains.
I'm going through the exact same issue here, so I'll go through the steps i'm taking to improve my performance.
Separate the log and the dbf file onto different spindle sets
Use basic recovery
you didn't mention any indexing requirements other than the fact that the order of inserts isn't important - in this case clustered indexes on anything other than an identity column shouldn't be used.
start your scaling of concurrency again from 1 and stop when your performance flattens out; anything over this will likely hurt performance.
rather than dropping to disk to bcp, and as you are using SQL Server 2008, consider inserting multiple rows at a time; this statement inserts three rows in a single sql call
INSERT INTO table VALUES ( 1,2,3 ), ( 4,5,6 ), ( 7,8,9 )
I was topping out at ~500 distinct inserts per second from a single thread. After ruling out the network and CPU (0 on both client and server), I assumed that disk io on the server was to blame, however inserting in batches of three got me 1500 inserts per second which rules out disk io.
It's clear that the MS client library has an upper limit baked into it (and a dive into reflector shows some hairy async completion code).
Batching in this way, waiting for x events to be received before calling insert, has me now inserting at ~2700 inserts per second from a single thread which appears to be the upper limit for my configuration.
Note: if you don't have a constant stream of events arriving at all times, you might consider adding a timer that flushes your inserts after a certain period (so that you see the last event of the day!)
Some suggestions for increasing insert performance:
Increase ADO.NET BatchSize
Choose the target table's clustered index wisely, so that inserts won't lead to clustered index node splits (e.g. autoinc column)
Insert into a temporary heap table first, then issue one big "insert-by-select" statement to push all that staging table data into the actual target table
Apply SqlBulkCopy
Choose "Bulk Logged" recovery model instad of "Full" recovery model
Place a table lock before inserting (if your business scenario allows for it)
Taken from Tips For Lightning-Fast Insert Performance On SqlServer
I'm wondering if you have a table that contains 24 million record, how does that impact performance (does each insert/update/delete) take significantly longer to go through?
This is our Audit table, so when we make change changes in other tables we log then on to the Audit tale, does it also take significantly longer to carry out these update as well ?
The right answer is "it depends", of course...
But as far as I get, your concern is in how Audit table affects performance of queries (on other tables) when Audit table grows.
Probably you only insert into your Audit table. Insert time doesn't depend on amount of data already in table. So, no matter how big Audit table is, it should affect performance equally (given that database design isn't incredibly bad).
Of course, select or delete on Audit table itseft can take longer when the table grows.
If I read your question as "does a large Oracle table take longer for IUD operations", generally speaking the answer is no. I think the most impact on the insert/update/delete operations will be felt from the indexes present on this table (more indexes = slower performance for these operations).
However, if your auditing logic needs to look up existing rows in the audit table for procedural logic in some manner that doesn't use primary or unique keys, then there will be a performance impact with a large table.
There are many factors that come into play in regards to how fast an insert/update/delete occurs. For example, how many indexes are on the table? If a table has many indexes and you insert/update the table, it can cause the operation to take longer. How is the data stored in the physical structures of the database (i.e. the tablespaces if you're using Oracle, for example)? Are your indexes and data on separate disks, which can help speed up I/O?
Obviously, if you are writing out audit records then it can affect performance. But in a well-tuned database, it shouldn't be slowing it down enough to where you notice.
The approach I use for audit tables is to use triggers on the main tables and these triggers write out the audit records. But from a performance standpoint, it really depends on a lot of factors as to how fast the updates to your main tables will run.
I would recommend looking at the explain plan output for one of your slow updates if you are using Oracle (other DBs usually have such tools as well, google can help here). You can then see what plan the optimizer is generating and diagnose where the problems could be. You could potentially get a DBA to assist you as well to help figure out what's causing the slowness.
I'd suspect performance will be more related to contention than table size. Generally inserts happen at the 'end' of the table. Sessions inserting into that table have to take a latch on the block while they are writing records to it. During that time other sessions may have to wait for that block (which is where you may see busy buffer wait events).
But really you need to look at the SQLs in question and see what they are waiting on, and whether they are contributing significantly to an unacceptable level of performance. Then determine a course of action based on the specific wait events that are causing the problem.
Check out anything from Cary Milsap on performance tuning
The impact of table size is different for INSERT,DELETE AND UPDATE operation
Insert statement is not impacted much by table size as when we Insert data into table it will add to the next data block available.If there are Indexes on that particular table then Oracle has to search for particular data block before inserting data in that block ,which require search operation that need time.
Delete and Update statements are impacted by Table size as more the data more time is require to search for the particular row to Delete and Update operation