I would like to use parallel execution on LET_TESTATE_LETTURE without forcing the full table scan, I want to use force the parallelism on index.
How can I solve?
alter session enable parallel dml;
CREATE TABLE netatemp.let_testate_letture1
AS
SELECT /* parallel(tele 32) full(tele) */
tele.TELE_DATA_LETTURA,
tele.tele_storico_id
FROM let_testate_letture tele
WHERE tele.prov_provenienza_lettura_id = '*1ENI01BCAMBIO'
AND tele.spwkf_stato_pubblico_id != '*1UNICOANN';
Size 56,1 GB
Number Extents 1.081
OWNER SIUMETERING
TABLE_NAME LET_TESTATE_LETTURE
TABLESPACE_NAME SIUMETERING_DATITD
CLUSTER_NAME
IOT_NAME
STATUS VALID
PCT_FREE 10
PCT_USED
INI_TRANS 30
MAX_TRANS 255
INITIAL_EXTENT 80 KB
NEXT_EXTENT 1 MB
MIN_EXTENTS 1
MAX_EXTENTS 2.147.483.645
PCT_INCREASE
FREELISTS
FREELIST_GROUPS
LOGGING YES
BACKED_UP N
NUM_ROWS 456.635.338
BLOCKS 3.340.120
EMPTY_BLOCKS 0
AVG_SPACE 0
CHAIN_CNT 0
AVG_ROW_LEN 385
AVG_SPACE_FREELIST_BLOCKS 0
NUM_FREELIST_BLOCKS 0
DEGREE 1
INSTANCES 1
CACHE N
TABLE_LOCK ENABLED
SAMPLE_SIZE 456.635.338
LAST_ANALYZED 29/12/2012 13:03:15
PARTITIONED NO
IOT_TYPE
TEMPORARY N
SECONDARY N
NESTED NO
BUFFER_POOL DEFAULT
FLASH_CACHE DEFAULT
CELL_FLASH_CACHE DEFAULT
ROW_MOVEMENT DISABLED
GLOBAL_STATS YES
USER_STATS NO
DURATION
SKIP_CORRUPT DISABLED
MONITORING YES
CLUSTER_OWNER
DEPENDENCIES DISABLED
COMPRESSION ENABLED
COMPRESS_FOR OLTP
DROPPED NO
READ_ONLY NO
SEGMENT_CREATED YES
RESULT_CACHE DEFAULT
you have to alter the index to parallel. ie
alter index xxx parallel;
or
alter index xxx parallel <n>;
as the parallel hint only applies to tables.
Try
/*+ parallel_index(tele, let_tele_letb_prov_fk_idx, 32) */
Notice the "+" after the asterisk. Without it Oracle will ignore the hint.
Also, you may want to create the table in parallel as well depending on the nr of rows returned, like:
CREATE TABLE netatemp.let_testate_letture1 parallel 32 as
select /*+ ...
You posted a lot of useful information, which is very refreshing. So in addition to answering your question, I can provide
other advice for improving performance:
Statement-level hint.
Since you are on 11gR2 (based on the existence of the column SEGMENT_CREATED), you should use a
statement-level parallel hint, instead of an object-level. Use /*+ parallel(32) */, and then Oracle will parallelize everything in the query, regardless of the access method or the alias names.
Old stats or old data?
Last Analyzed on 2012-12-29 seems kind of old. If your table is very active, then you should re-gather statistics. If it really doesn't change very often, you may want to consider re-creating it, ordered by prov_provenienza_lettura_id. That may significantly improve the performance of your index. Although it could decrease performance of other indexes.
Compression.
Your table uses compression, but is your index also compressed? If you really have 9 million entries for the same value, index compression could save a huge amount of space, and make index reads much faster. Also, a bitmap index may be a good fit here.
Full table scans.
The optimizer thinks you're going to read about 2% of your rows. Even 2% may be enough to warrant a full table scan, depending on things such as the clustering factor. You may not want to try to force a specific access method - first let Oracle try to pick. If Oracle gets it wrong, then you should try to help it by providing more useful information, such as better statistics and maybe a histogram.
The advice from DazzaL and Ronnis should also be helpful.
Related
This is a re-occuring Problem for me. I have statements that work well for a while and after a while the optimizer decides to choose another execution plan. This even happens for when I query for exactly one (composite) primary key.
When I look up the execution plan in dba_hist_sql_plan, it shows me costs of 20 for the query using the primary key index and costs of 270 for the query doing a full table scan.
plan_hash_value Operation Options Cost Search_Columns
2550672280 0 SELECT STATEMENT 20
2550672280 1 PARTITION HASH SINGLE 20
2550672280 2 TABLE ACCESS BY LOCAL INDEX ROWID 20
2550672280 3 INDEX RANGE SCAN 19 1
3908080950 0 SELECT STATEMENT 270
3908080950 1 PARTITION HASH SINGLE 270
3908080950 2 TABLE ACCESS FULL 270
I already noticed that the optimizer only uses the first column in the Primary key index and then does a range scan. But my real question is: Why does the optimizer choose the higher cost execution plan? It's not that both executions plans are used at the same time, I notice a switch within one snapshot and then it stays like that for several hours/days. So it can't be an issue of bind peeking.
Our current solution is that I call our DBA and he flushes the Statement Cache. But this is not really sustainable.
EDIT:
The SQL looks something like this: select * from X where X.id1 = ? and X.id2 = ? and X.id3 = ?
with (id1,id2,id3) being the composite primary key (with a unique index) on the table.
Maybe it's related to one bug on Oracle 11g.
Bug 18377553 : POOR CARDINALITY ESTIMATE WITH HISTOGRAMS AND VALUES > 32 BYTES
When your data is like :
AAAAAAAAAAAAAAAAAAAAmyvalue
AAAAAAAAAAAAAAAAAAAAsomeohtervalue
AAAAAAAAAAAAAAAAAAAAandsoon
B1234
Histograms do not work well.
The solution is disabling histograms on primary key and all will start working smoothly.
Most likely clustering factor and blevel of the index could be very high. Check the blevel by querying dba_indexes. If blevel is greater than 3 try rebuilding the index.
Also check whether the index created for primary key is unique or not. As per the plan it is using range scan instead of unique scan. Most likely the index is not unique.
Apparently the optimizer doesn't correctly display costs regarding type conversions. The root cause for this Problem was incorrect type mapping for a date value. While the column in the database is of type DATE, the JDBC type was incorrectly java.sql.Timestamp. To compare a DATE column with a Timestamp search parameter, all values in the table need to be transferred to Timestamp first. Which is additional cost and renders an index unusable.
I am developing a DWH on Oracle 11g. We have some big tables (250+ million rows), partitioned by value. Each partition is a assigned to a different feeding source, and every partition is independent from others, so they can be loaded and processed concurrently.
Data distribution is very uneven, we have partition with millions rows, and partitions with not more than a hundred rows, but I didn't choose the partitioning scheme, and by the way I can't change it.
Considered the data volume, we must assure that every partition has always up-to-date statistics, because if the subsequent elaborations don't have an optimal access to the data, they will last forever.
So for each concurrent ETL thread, we
Truncate the partition
Load data from staging area with
SELECT /*+ APPEND */ INTO big_table PARTITION(part1) FROM temp_table WHERE partition_colum = PART1
(this way we have direct path and we don't lock the whole table)
We gather statistics for the modified partition.
In the first stage of the project, we used the APPROX_GLOBAL_AND_PARTITION strategy and worked like a charm
dbms_stats.gather_table_stats(ownname=>myschema,
tabname=>big_table,
partname=>part1,
estimate_percent=>1,
granularity=>'APPROX_GLOBAL_AND_PARTITION',
CASCADE=>dbms_stats.auto_cascade,
degree=>dbms_stats.auto_degree)
But, we had the drawback that, when we loaded a small partition, the APPROX_GLOBAL part was dominant (still a lot faster than GLOBAL) , and for a small partition we had, e.g., 10 seconds of loading, and 20 minutes of statistics.
So we have been suggested to switch to the INCREMENTAL STATS feature of 11g, which means that you don't specify the partition you have modified, you leave all parameters in auto, and Oracle does it's magic, automatically understanding which partition(s) have been touched. And it actually works, we have really speeded up the small partition. After turning on the feature, the call became
dbms_stats.gather_table_stats(ownname=>myschema,
tabname=>big_table,
estimate_percent=>dbms_stats.auto_sample_size,
granularity=>'AUTO',
CASCADE=>dbms_stats.auto_cascade,
degree=>dbms_stats.auto_degree)
notice, that you don't pass the partition anymore, and you don't specify a sample percent.
But, we're having a drawback, maybe even worse that the previous one, and this is correlated with the high level of parallelism we have.
Let's say we have 2 big partition that starts at the same time, they will finish the load phase almost at the same time too.
The first thread ends the insert statement, commits, and launches the stats gathering. The stats procedure notices there are 2 partition modified (this is correct, one is full and the second is truncated, with a transaction in progress), updates correctly the stats for both the partitions.
Eventually the second partition ends, gather the stats, it see all partition already updated, and does nothing (this is NOT correct, because the second thread committed the data in the meanwhile).
The result is:
PARTITION NAME | LAST ANALYZED | NUM ROWS | BLOCKS | SAMPLE SIZE
-----------------------------------------------------------------------
PART1 | 04-MAR-2015 15:40:42 | 805731 | 20314 | 805731
PART2 | 04-MAR-2015 15:41:48 | 0 | 16234 | (null)
and the consequence is that I occasionally incur in not optimal plans (which mean killing the session, refresh manually the stats, manually launch the precess again).
I tried even putting an exclusive lock on the gathering, so no more than one thread can gather stats on the same table at once, but nothing changed.
IMHO this is an odd behaviour, because the stats procedure, the second time it is invoked, should check for the last commit on the second partition, and should see it's newer than the last stats gathering time. But seems it's not happening.
Am I doing something wrong? Is it an Oracle bug? How can I guarantee that all stats are always up-to-date with incremental stats feature turned on, and an high level of concurrency?
I managed to reach a decent compromise with this function.
PROCEDURE gather_tb_partiz(
p_tblname IN VARCHAR2,
p_partname IN VARCHAR2)
IS
v_stale all_tab_statistics.stale_stats%TYPE;
BEGIN
BEGIN
SELECT stale_stats
INTO v_stale
FROM user_tab_statistics
WHERE table_name = p_tblname
AND object_type = 'TABLE';
EXCEPTION
WHEN NO_DATA_FOUND THEN
v_stale := 'YES';
END;
IF v_stale = 'YES' THEN
dbms_stats.gather_table_stats(ownname=>myschema,
tabname=> p_tblname,
partname=>p_partname,
degree=>dbms_stats.auto_degree,
granularity=>'APPROX_GLOBAL AND PARTITION') ;
ELSE
dbms_stats.gather_table_stats(ownname=>myschema,
tabname=>p_tblname,
partname=>p_partname,
degree=>dbms_stats.auto_degree,
granularity=>'PARTITION') ;
END IF;
END gather_tb_partiz;
At the end of each ETL, if the number of added/deleted/modified rows is low enough not to mark the table as stale (10% by default, can be tuned with STALE_PERCENT parameter), I collect only partition statistics; otherwise i collect global and partition statistics.
This keeps ETL of small partition fast, because no global partition must be regathered, and big partition safe, because any subsequent query will have fresh statistics and will likely use an optimal plan.
Incremental stats is anyway enabled, so whenever the global has to be recalculated, it is pretty fast because aggregates partition level statistics and does not perform a full scan.
I am not sure if, with incremental enabled, "APPROX_GLOBAL AND PARTITION" and "GLOBAL AND PARTITION" do differ in something, because both incremental and approx do basically the same thing: aggregate stats and histograms without doing a full scan.
Have you tried to have incremental statistics on, but still explicitly name a partition to analyze?
dbms_stats.gather_table_stats(ownname=>myschema,
tabname=>big_table,
partname=>part,
degree=>dbms_stats.auto_degree);
For your table, stale (yesterday's) global stats are not as harmful as completely invalid partition stats (0 rows). I can propose 2 a bit alternative approaches that we use:
Have a separate GLOBAL stats gathering executed by your ETL tool right after all partitions are loaded. If it's taking too long, play with estimate_percent as dbms_stats.auto_degree will likely to be more than 1%
Gather the global (as well as all other stale) stats in a separate database job run later during the day, after all data is loaded into DW.
The key point is that stale statistics which differ only slightly from fresh are almost just as good. If statistics show you 0 rows, they'll kill any query.
Considering what you are trying to achieve, you need to run stats on specific intervals of time for all Partitions and not at the end of the process that loads each partition. It could be challenging if this is a live table and has constant data loads happening round the clock, but since these are LARGE DW tables I really doubt that's the case. So the best bet would be to collect stats at the end of loading all partitions, this will ensure that the statistics is collected for partitions where data has change or statistics are missing and update the global statistics based on the partition level statistics and synopsis.
However to do so, you need to turn on incremental feature for the table (11gR1).
EXEC DBMS_STATS.SET_TABLE_PREFS('<Owner>','BIG_TABLE','INCREMENTAL','TRUE');
At the end of every load, gather table statistics using GATHER_TABLE_STATS command. You don't need to specify the partition name. Also, do not specify the granularity parameter.
EXEC DBMS_STATS.GATHER_TABLE_STATS('<Owner>','BIG_TABLE');
Kindly check if you have used DBMS_STATS to set table preference to gather incremental statistics.This oracle blog explains that statistics will be gathered after each row affected.
Incremental statistics maintenance needs to gather statistics on any partition that will change the global or table level statistics. For instance, the min or max value for a column could change after just one row is inserted or updated in the table
BEGIN
DBMS_STATS.SET_TABLE_PREFS(myschema,'BIG_TABLE','INCREMENTAL','TRUE');
END;
I'm a bit rusty about it, so first of all a question:
did you try serializing partition loading? If so, how long and how well does statistics run? Notice that since loading time is so much smaller than statistics gathering, i guess this could also act as a temporary workaround.
Append hint does affects redo size, meaning the transaction just traces something, thus statistics may not reckon new data:
http://oracle-base.com/articles/misc/append-hint.php
Thinking out loud: since the direct path insert does append rows at the end of the partition and eventually updates metadata at the end, the already running thread gathering statistics could have read non-updated (stale) data. Thus it may not be a bug, and locking threads would accomplish nothing.
You may test this behaviour temporarily switching your table/partition to LOGGING, for instance, and see how it works (slower, of course, but it's a test). Can you do it?
EDIT: incremental stats should work anyway, even disabling a parallel statistics gathering, since it reiles on the incremental values no matter how they were collected:
https://blogs.oracle.com/optimizer/entry/incremental_statistics_maintenance_what_statistics
I have a table that is 5 GB, now I was trying to delete like below:
delete from tablename
where to_char(screatetime,'yyyy-mm-dd') <'2009-06-01'
But it's running long and no response. Meanwhile I tried to check if anybody is blocking with this below:
select l1.sid, ' IS BLOCKING ', l2.sid
from v$lock l1, v$lock l2
where l1.block =1 and l2.request > 0
and l1.id1=l2.id1
and l1.id2=l2.id2
But I didn't find any blocking also.
How can I delete this large data without any problem?
5GB is not a useful measurement of table size. The total number of rows matters. The number of rows you are going to delete as a proportion of the total matters. The average length of the row matters.
If the proportion of the rows to be deleted is tiny it may be worth your while creating an index on screatetime which you will drop afterwards. This may mean your entire operation takes longer, but crucially, it will reduce the time it takes for you to delete the rows.
On the other hand, if you are deleting a large chunk of rows you might find it better to
Create a copy of the table using
'create table t1_copy as select * from t1
where screatedate >= to_date('2009-06-01','yyyy-mm-dd')`
Swap the tables using the rename command.
Re-apply constraints, indexs to the new T1.
Another thing to bear in mind is that deletions eat more UNDO than other transactions, because they take more information to rollback. So if your records are long and/or numerous then your DBA may need to check the UNDO tablespace (or rollback segs if you're still using them).
Finally, have you done any investigation to see where the time is actually going? DELETE statements are just another query, and they can be tackled using the normal panoply of tuning tricks.
Use a query condition to export necessary rows
Truncate table
Import rows
If there is an index on screatetime your query may not be using it. Change your statement so that your where clause can use the index.
delete from tablename where screatetime < to_date('2009-06-01','yyyy-mm-dd')
It runs MUCH faster when you lock the table first. Also change the where clause, as suggested by Rene.
LOCK TABLE tablename IN EXCLUSIVE MODE;
DELETE FROM tablename
where screatetime < to_date('2009-06-01','yyyy-mm-dd');
EDIT: If the table cannot be locked, because it is constantly accessed, you can choose the salami tactic to delete those rows:
BEGIN
LOOP
DELETE FROM tablename
WHERE screatetime < to_date('2009-06-01','yyyy-mm-dd')
AND ROWNUM<=10000;
EXIT WHEN SQL%ROWCOUNT=0;
COMMIT;
END LOOP;
END;
Overall, this will be slower, but it wont burst your rollback segment and you can see the progress in another session (i.e. the number of rows in tablename goes down). And if you have to kill it for some reason, rollback won't take forever and you haven't lost all work done so far.
I have a partitioned table like so:
create table demo (
ID NUMBER(22) not null,
TS TIMESTAMP not null,
KEY VARCHAR2(5) not null,
...lots more columns...
)
The partition is on the TS column (one partition per year).
Since I search a lot via the timestamp, I created a combined index:
create index demo.x1 on demo (ts, key);
The query looks like this:
select *
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
I also tried to add and t.KEY = '00101' but that doesn't help.
But EXPLAIN PLAN says that TABLE ACCESS and FULL:
# Operation Options Object Mode Cost Bytes Cardinality
0 SELECT STATEMENT ALL_ROWS 583804 287145 2127
1 PARTITION RANGE ALL 583804 287145 2127
2 TABLE ACCESS FULL HEADER ANALYZED 583804 287145 2127
No mention of the index. What could be wrong?
[EDIT] For some reason, Oracle completely miscalculated the cost for the operation. I have 112 million rows in that table. The cost for a full scan of a single partition should be 20 million, not 600'000. That's why it even ignores optimizer hints.
[EDIT2] During my tests, I ran over this puzzling result. When I run this select:
select tx_ts
from kt.header
where tx_ts = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
I get this EXPLAIN PLAN:
0 SELECT STATEMENT ALL_ROWS 152 15616 1952
1 PARTITION RANGE ALL 152 15616 1952
2 INDEX FAST FULL SCAN HEADERX2 ANALYZED 152 15616 1952
So when I restrict myself to the indexed column as the result of the select, Oracle decides to use the index. When I want to get all columns, I have to wait for a full table scan. What's going on here?
[EDIT2] Found it; see my answer below.
Okay, it was a mistake on my part: The column had the type DATE, not TIMESTAMP. Since I used to_timestamp(), Oracle saw no way to use the index.
I'm not really an expert on partitioning, but I think what has happened here is that you have created a global index -- a single index that covers rows in all of the partitions. Therefore, the optimizer has to choose between two mutually exclusive access paths: (A) an index range scan, or (B) partition pruning. I believe the PARTITION RANGE operation indicates that it has chosen B.
Updating statistics, as others have suggested, may change the behavior. When you drop and recreate the index, you discard any statistics that existed for the index.
Creating the index as UNIQUE, if the timestamp and key uniquely identify a row, would be a good idea and might change the behavior as well.
However, I think the real "fix" is that you should instead create local indexes -- a separate index on each partition. This should enable the optimizer to do partition pruning followed by an index lookup. Honestly, I'm not sure what the exact syntax is to do this. Maybe you just create the index on each partition explicitly using the individual partition names.
If everything else fails, you might try an optimizer hint:
select /*+ index(demo.demo demo.x1) */ *
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
Are your stats up to date? Invalid stats may mean that oracle believes a full table scan is faster than using the index. Are you using any hints in your query that might be telling oracle to do a full scan?
Can you supply us with the full query and explain plan results?
Edit: Aaron, you can update the stats using "dbms_stats.gather_schema_stats" or "dbms_stats.gather_table_stats" commands. See here for more information on the commands. This will update all the relevant stats for the schema or table specified. Oracle's Cost Based Optimiser will use the statistics to determine which execution plan to choose. It never uses the actual table sizes. You'll need to re-update your stats when the size of your table changes significantly ( +/- 10% or so)
Another thing. When you use a compound index, you need to specify all the columns used in the index in your query for the optimizer to consider the index (and I think you need to specify them in the same order as well, though I could be wrong about that, it's been a while since I looked at this stuff)
There may just be a typo in your transcription of the "CREATE INDEX..." statement that you posted, but are you sure you actually have created the index?
To give us some first-pass idea of the statistics, use these queries:
select table_name, num_rows
from user_tables
where table_name = 'DEMO';
select table_name, num_rows
from user_tab_partitions
where table_name = 'DEMO';
select index_name, num_rows from user_indexes
where table_name in
(select table_name
from user_tables where table_name = 'DEMO');
Also, exactly how are you generating the EXPLAIN PLAN? Do you have access to the database host to retrieve a trace file if you enable tracing?
[edit]
As I commented, it would be good to see the trace of an actual execution of the query. Since you've indicated you have access to the db host filesystems, run a SQL script that (in the same session) issues the following:
alter session set sql_trace=true;
select /* THIS IS THE TRACE */
*
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS');
exit
After the script exits, find out
where the trace file directory is by
this query:
select value from v$parameter where name = 'user_dump_dest';
Use whatever searching tool is
available to you to find the file
that includes the string "THIS IS THE
TRACE"
Process/profile the trace file by
issuing the OS command tkprof
traceFileName.trc tkprof.out
Examine this file - you'll see some overhead information, but there will be a section that details the actual execution plan and statistics for the query. If you see the same results in this information then the next step is to add another statement (after the first "alter session") that will dump information on why the Oracle CBO is ignoring the index:
ALTER SESSION SET EVENTS='10053 trace name context forever, level 1';
(Database: Oracle 10G R2)
It takes 1 minute to insert 100,000 records into a table. But if the table already contains some records (400K), then it takes 4 minutes and 12 seconds; also CPU-wait jumps up and “Free Buffer Waits” become really high (from dbconsole).
Do you know what’s happing here? Is this because of frequent table extents? The extent size for these tables is 1,048,576 bytes. I have a feeling DB is trying to extend the table storage.
I am really confused about this. So any help would be great!
This is the insert statement:
begin
for i in 1 .. 100000 loop
insert into customer
(id, business_name, address1,
address2, city,
zip, state, country, fax,
phone, email
)
values (customer_seq.nextval, dbms_random.string ('A', 20), dbms_random.string ('A', 20),
dbms_random.string ('A', 20), dbms_random.string ('A', 20),
trunc (dbms_random.value (10000, 99999)), 'CA', 'US', '798-779-7987',
'798-779-7987', 'asdfasf#asfasf.com'
);
end loop;
end;
Here dstat output (CPU, IO, MEMORY, NET) for :
Empty Table inserts: http://pastebin.com/f40f50dbb
Table with 400K records: http://pastebin.com/f48d8ebc7
Output from v$buffer_pool_statistics
ID: 3
NAME: DEFAULT
BLOCK_SIZE: 8192
SET_MSIZE: 4446
CNUM_REPL: 4446
CNUM_WRITE: 0
CNUM_SET: 4446
BUF_GOT: 1407656
SUM_WRITE: 1244533
SUM_SCAN: 0
FREE_BUFFER_WAIT: 93314
WRITE_COMPLETE_WAIT: 832
BUFFER_BUSY_WAIT: 788
FREE_BUFFER_INSPECTED: 2141883
DIRTY_BUFFERS_INSPECTED: 1030570
DB_BLOCK_CHANGE: 44445969
DB_BLOCK_GETS: 44866836
CONSISTENT_GETS: 8195371
PHYSICAL_READS: 930646
PHYSICAL_WRITES: 1244533
UPDATE
I dropped indexes off this table and performance improved drastically even when inserting 100K into 600K records table (which took 47 seconds with no CPU wait - see dstat output http://pastebin.com/fbaccb10 ) .
Not sure if this is the same in Oracle, but in SQL Server the first thing I'd check is how many indexes you have on the table. If it's a lot the DB has to do a lot of work reindexing the table as records are inserted. It's more difficult to reindex 500k rows than 100k.
The indices are some form of tree, which means the time to insert a record is going to be O(log n), where n is the size of the tree (≈ number of rows for the standard unique index).
The fastest way to insert them is going to be dropping/disabling the index during the insert and recreating it after, as you've already found.
Even with indexes, 4 minutes to insert 100,000 records seems like a problem to me.
If this database has I/O problems, you haven't fixed them and they will appear again. I would recommend that you identify the root cause.
If you post the index DDL, I'll time it for a comparison.
I added indexes on id and business_name. Doing 10 iterations in a loop, the average time per 100,000 rows was 25 seconds. This was on my home PC/server all running on a single disk.
Another trick to improve performance is to turn on or set the cache higher on your sequence(customer_seq). This will allow oracle to allocate the sequence into memory instead of hitting the object for each insert.
Be careful with this one though. In some situations this will cause gaps your sequence to have gaps between values.
More information here:
Oracle/PLSQL: Sequences (Autonumber)
Sorted inserts always take longer the more entries there are in the table.
You don't say which columns are indexed. If you had indexes on fax, phone or email, you would have had a LOT of duplicates (ie every row).
Oracle 'pretends' to have non-unique indexes. In reality every index entry is unique with the rowid of the actual table row being the deciding factor. The rowid is made up of the file/block/record.
It is possible that, once you hit a certain number of records, the new ones were getting rowids which meant that had to be fitted into the middle of existing indexes with a lot of index re-writing going on.
If you supply full table and index creation statements, others would be able to reproduce the experience which would have allowed for more evidence based responses.
i think it has to do with the extending the internal structure of the file, as well as building database indexes for the added information - i believe the database arranges the data in a non-linear fashion that helps speed up data retrieval on selects