Is direct-path insert a good way to do bulk inserts in Oracle?

Is direct-path insert a good way to do bulk inserts in Oracle? - oracle

We're trying to figure out the best way to handle BULK INSERTs using Oracle (10gR2), and I'm finding that it can be a pretty complicated subject. One method that I've found involves using the Append optimizer hint:
INSERT /*+ Append*/
INTO some_table (a, b)
VALUES (1, 2)
My understanding is that this will tell Oracle to ignore indexes and just put the results at the end of the table. Then, all I should have to do is rebuild the indexes:
ALTER INDEX some_index REBUILD
This would be easier than trying to launch SQL*Loader as an external process or doing some pl/SQL. This almost seems too easy. Is there something I'm missing? Any things that could come back to bite me if I take this approach?

A few notes ...
A single row cannot be appended, therefore APPEND is only valid with INSERT INTO ... SELECT FROM syntax.
An append is the addition of data above the high water mark of the table, in which the data is formatted into complete blocks that are then written to the table and which bypass the SQL engine
An append in parallel mode requires that each parallel query thread allocate at least one new extent to the table, into which the new blocks are written. This can be wasteful of space.
The indexes are not ignored, but maintenance of them is defered until the blocks have been written into the table.
See he docs for more important information: http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/tables.htm#ADMIN01509

Related

In Oracle, what dictionary table tells me the "store in" value for partitioned indexes?

We are running Oracle 11g and have some partitioned tables. I am trying to write an automated process to script out the indexes on these tables. (Basically when we do bulk loads, we want to drop all the indexes beforehand and recreate them afterward.)
The problem I have is knowing how to script out the partitioned indexes. Some are created with "LOCAL STORE IN (tablespacename)" and others just with "LOCAL" (which stores index extents in the same partition as the data). In either case, dba_indexes.tablespace_name is null, and I have having a heck of a time scripting out the two different cases correctly.
I know I can simply re-run the original DDL to recreate the indexes, but multiple parts of the organization can make changes, and there would be less risk if the loader tool could be self-contained and simply rebuild whatever was there to begin with.
I can query dba_ind_subpartitions, and if the tablespace_name values for every subpartition all match, then I can assume/infer that I should STORE IN that tablespace name. But, if the table is in a small single-partition state (e.g. newly created or just after archival), then the ones created with just LOCAL also match this test, so this is also not a perfect way of telling them apart.
I can compare the names of the index subpartition tablespaces to the data table partition tablespaces, and if they match, then I can assume/infer that those should be created with just LOCAL. But, that drags a bunch of extra tables into my query and makes it really hard to read, so I am worried about maintainability going forward. Plus, it just seems like a kludge.
It seems like there should be someplace in Oracle's data dictionaries where it is simply keeping track of this, and where I can just directly look it up instead of having to do a bunch of math and rely on assumptions. But, I have done a good deal of digging and haven't yet found it. So, any help would be much appreciated.

Although an insert alone is faster without the presence of indexes, have you benchmarked a load into tables with indexes enabled and established that it is slower than disabling (more robust than dropping!) and rebuilding them?
When you direct path insert into a table with indexes, Oracle optimises the index maintenance process by creating temporary segments to hold just the data required for the index builds. This generally allows the index maintenance to scan much smaller segments than otherwise required -- the temp segments plus the existing indexes.

Well, as jonearles describes, the dbms_metadata package is the way to generate DDL for existing objects.
But, it seems to me, this is more work than is required for what you're trying to achieve. If this is all for loading data, I recommend you simply alter the indexes to be unusable, set 'skip_unusable_indexes=true', do the data load, and the rebuild the indexes.
This should achieve what you want, without having to drop and re-create the indexes.

DBMS_METADATA.GET_DDL is easier than querying the data dictionary:
--Sample table and index.
create table test1(a number);
create index test1_idx on test1(a);
--Store the DDL, drop the index, then re-create it.
declare
ddl_before clob;
begin
ddl_before := dbms_metadata.get_ddl('INDEX', 'TEST1_IDX');
execute immediate 'drop index test1_idx';
--Do some processing here.
execute immediate ddl_before;
end;
/

Oracle Insert All vs Insert

In Oracle I came across two types of insert statement
1) Insert All: Multiple entries can be inserted using a single sql statement
2) Insert : One entry will be updated per insert.
Now I want to insert around 100,000 records at a time. (Table have 10 fields with includes a primary key). I am not concerned about any return value.
I am using oracle 11g.
Can you please help me with respect to performance which is better "Insert" or "Insert All".

I know this is kind of a Necro but it's pretty high on the google search results so I think this is a point that worth making.
Insert All can give dramatic performance benefits if you are building a web application because it is a single SQL statement that requires only one round trip to your database. In most cases although far from all cases. the majority of the cost of a query is actually latency. Depending on what framework you are using, this syntax can help you avoid unnecessary round trips.
This might seem incredibly obvious but I have seen many, many production web applications in large companies that have forgotten this simple fact.

Insert statement and insert all statement are practically the same conventional insert statement. insert all, which has been introduced in 9i version simply allows you to do insertion into multiple tables using one statement. Another type of insert that you could use to speed up the process is direct-path insert - you use /*+ append*/ or /*+ append_values*/(Oracle 11g) hints
insert /*+ append*/ into some_table(<<columns>>)
select <<columns or literals>>
from <<somwhere>>
or (Oracle 11g)
insert /*+ append_values*/ into some_table(<<columns>>)
values(<<values>>)
to tell Oracle that you want to perform direct-path insert. But, 100K rows it's not that many rows and conventional insert statement will do just fine. You wont get significant performance advantage using direct-path insert with that amount of data. Moreover direct-path insert wont reuse free space, it adds new data after HWM(high water mark), hence require more space. You wont be able to use select statement or other DML statement, if you has not issued commit.

To use FORALL you would need PLSQL tables.
This process is quite fast.
You can also choose the table to have NO LOG option which would speed the process up during inserts.

Will inserting half a million entries with the same date value be slowed by a non-unique index on that date column?

I have a cursor that selects all rows in a table, a little over 500,000 rows. Read a row from cursor, INSERT into other table, which has two indexes, neither unique, one numeric, one 'DATE' type. COMMIT. Read next row from Cursor, INSERT...until Cursor is empty.
All my DATE column's values are the same, from a timestamp initialized at the start of the script.
This thing's been running for 24 hours, only posted 464K rows, a little less than 10K rows / hr.
Oracle 11g, 10 processors(!?)
Something has to be wrong. I think it's that DATE index trying to process all these entries with exactly the same value for that column.

Why don't you just do:
insert into target (columns....)
select columns and computed values
from source
commit
?
This slow by slow is doing far more damage to performance than an index that may not make any sense.

Indexes slow down inserts but speed up queries. This is normal.
If it is a problem you can remove the index, insert the rows, then add the index again. This can be faster if you are doing many inserts at once.
The way you are copying the data using cursors seems to be inefficient. You could try a set-based approach instead:
INSERT INTO table1 (x, y, z)
SELECT x, y, z FROM table2 WHERE ...

Committing after every inserted row doesn't make much sense. If you're worried about exceeding undo capacity, for example, you can keep a count of the inserts and issue a commit after every thousand rows.
Updating the indexes will have some impact but that's unavoidable if you can't drop (or disable) while the inserts are performed, but that's just how it goes. I'd expect the commits to have a bigger impact, though I suspect that's a topic with varied opinions.
This assumes you have a good reason for inserting from a cursor rather than as a direct insert into ... select from model.

In general, its often a good idea to delete the indexes before doing a massive insert and then add them back afterwards, so that the db doesnt have to try to update the indexes with each insert. Its been a long while since I've used oracle, but had you tried putting more than one insert statement in a transaction? That should also speed it up.

For operations like this you should look at oracle bulk operations, using FORALL and BULK COLLECT. It will reduce the number of DDL operations on the underlying tables considerably
create or replace procedure fast_proc is
type MyTable is table of source_table%ROWTYPE;
MyTable table;
begin
select * BULK COLLECT INTO table from source_table;
forall x in table.First..table.Last
insert into dest_table values table(x) ;
end;

Agreed on comment that what is killing your time is the 'slow by slow' processing. Copying 500,000 rows should be a matter of minutes.
The single INSERT ... SELECT FROM .... approach would be the best one, provided you have big enough Rollback segments. The database may even automatically apply parallel techniques to a plain SQL statement that it will not do with PL/SQL.
In addition you could look at using the /*+ APPEND */ hint - read up on it and see if it may apply to the situation with your target table.
o use all 10 cores you will need to either use plain parallel SQL, or run 10 copies of your pl/sql block, splitting the source table across the 10 copies.
In Oracle 10 this is a manual task (roll your own parallelism) but Oracle 11.2 introduces DBMS_PARALLEL_EXECUTE.
Failing that, bulking up your fetch / insert using the BULK COLLECT & bulk insert would be the next best option - process in chunks of 1000 or so rows (or larger). Again take a look as to whether DBMS_PARALLEL_EXECUTE may help you, or if you could submit the job in chunks via DBMS_JOB.
(Caveat : I don't have access to anything later than Oracle 10)

can oracle types be updated like tables?

I am converting GTT's to oracle types as explained in an excellent answer by APC. however, some GTT's are being updated based on a select query from another table. For example:
UPDATE my_gtt_1 c
SET (street, city, STATE, zip) = (SELECT src.unit_address,
src.unit_city,
src.unit_state,
src.unit_zip_code
FROM (SELECT mbr.ROWID row_id,
unit_address,
RTRIM(a.unit_city) unit_city,
RTRIM(a.unit_state) unit_state,
RTRIM(a.unit_zip_code) unit_zip_code
FROM table_1 b,
table_2 a,
my_gtt_1 mbr
WHERE type = 'ABC'
AND id = b.ssn_head
AND a.h_id = b.h_id
AND row_id >= v_start_row
AND row_id <= v_end_row) src
WHERE c.ROWID = src.row_id)
WHERE state IS NULL
OR state = ' ';
if my_gtt_1 was not a global temporary table but an oracle collection type then is it possible to do updates this complex? Or in these cases we are better off using the global temporary table?

you can not perform set UPDATE operations on object types. You will have to do it row by row, as in:
FOR i IN l_tab.FIRST..l_tab.LAST LOOP
SELECT src.unit_address,
src.unit_city,
src.unit_state,
src.unit_zip_code
INTO l_tab(i).street,
l_tab(i).city,
l_tab(i).STATE,
l_tab(i).zip
FROM (your_query) src;
END LOOP;
You should therefore try to do all computations at creation time (where you can BULK COLLECT). Obviously, if your process needs many steps you might find that a global temporary table outperforms an in-memory structure.
From the last questions you have asked, it seems you are trying to replace all global temporary tables with object tables. I would suggest caution because in general, they are not interchangeable:
Objects tables are in-memory structures: you don't want to load a million+ rows table into memory. They are mainly used as a buffer: you load a few (100 for example) rows into the structure, perform what you need to do with these rows then load the next batch. You can not easily treat this structure as a regular table: for example you can only search this structure efficiently with the standard indexing key (you cannot search by rowid in your example unless you define the structure to be indexed by rowid).
Temporary tables on the other hand are very similar to ordinary tables. You can load millions of rows in them, perform joins, complex set operations. You can index the temporary table for further optimization.
In my opinion, the change your are trying to conduct will take a massive overhaul of your logic and it may not perform better. In general, you would not replace GTT with object tables. You may be able to remove GTT with significant gain in performance by using SET operations directly (perform massive UPDATE/DELETE/INSERT on your data directly without a staging table).
I would suggest performing benchmarks before choosing a solution (this is probably what you are doing right now :)

I think this part of APC's answer to your previous question is relevant here:
Global temporary tables are also good
if we have a lot of intermediate
processing which is just too
complicated to be solved with a single
SQL query. Especially if that
processing must be applied to subsets
of the retrieved rows.
You cannot update the in-memory data with an UPDATE statement like you can a GTT; you would need to write procedural code to locate and change the array elements in question.

What is the fastest way to insert data into an Oracle table?

I am writing a data conversion in PL/SQL that processes data and loads it into a table. According to the PL/SQL Profiler, one of the slowest parts of the conversion is the actual insert into the target table. The table has a single index.
To prepare the data for load, I populate a variable using the rowtype of the table, then insert it into the table like this:
insert into mytable values r_myRow;
It seems that I could gain performance by doing the following:
Turn logging off during the insert
Insert multiple records at once
Are these methods advisable? If so, what is the syntax?

It's much better to insert a few hundred rows at a time, using PL/SQL tables and FORALL to bind into insert statement. For details on this see here.
Also be careful with how you construct the PL/SQL tables. If at all possible, prefer to instead do all your transforms directly in SQL using "INSERT INTO t1 SELECT ..." as doing row-by-row operations in PL/SQL will still be slower than SQL.
In either case, you can also use direct-path inserts by using INSERT /*+APPEND*/, which basically bypasses the DB cache and directly allocates and writes new blocks to data files. This can also reduce the amount of logging, depending on how you use it. This also has some implications, so please read the fine manual first.
Finally, if you are truncating and rebuilding the table it may be worthwhile to first drop (or mark unusable) and later rebuild indexes.

Regular insert statements are the slowest way to get data in a table and not meant for bulk inserts. The following article references a lot of different techniques for improving performance: http://www.dba-oracle.com/oracle_tips_data_load.htm

Drop the index, then insert the rows, then re-create the index.

If dropping the index doesn't speed things up enough, you need the Oracle SQL*Loader:
http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html

Suppose you have taken eid,ename,sal,job. So create a table first as:
SQL>create table tablename(eid number, ename varchar2(20),sal number,job char(10));
Now insert data:-
SQL>insert into tablename values(&eid,'&ename',&sal,'&job');

Check this link
http://www.dba-oracle.com/t_optimize_insert_sql_performance.htm
main points to consider for your
case is to use Append hint as this
will directly append into the table
instead of using freelist. If you can afford to turn off logging than use append with nologging hint to do it
Use a bulk insert instead instead of iterating in PL/SQL
Use sqlloaded to load the data directly into the table if you are getting data from a file feed

Here are my recommendations on fast insert.
Trigger - Disable any triggers associated with a table. Enable after Inserts are complete.
Index - Drop Index and re-create it after your Inserts are complete.
Stale stats - Re-analyze table and index stats.
Index de-fragmentation - Rebuild Index if needed
Use No Logging -Insert using INSERT APPEND (Oracle only). This approach is very risky approach, no redo logs are generated therefore you can’t do a rollback - make a backup of table before you start and don't try on live tables. Check if your db has similar option
Parallel Insert: Running parallel insert will get the job faster.
Use Bulk Insert
Constraints - Not much overhead during inserts but still a good idea to check, if it is still slow after even after step 1
You can learn more on http://www.dbarepublic.com/2014/04/slow-insert.html

Maybe one of your best option is to avoid Oracle as much as possible actually.
I've been baffled by this myself, but very often a Java process can outperform many of the Oracle's utilities which either use OCI (read: SQL Plus) or will take up so much of your time to get right (read: SQL*Loader).
This doesn't prevent you to use specific hints either (like /APPEND/).
I've been pleasantly surprised each time I've turned to that kind of solution.
Cheers,
Rollo

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio