Sessions are locking each other while direct insert into subpartition with name of subpartition specified - oracle

We have a one large table that we need to insert data of it into another table.
Target table is partitioned by range (by day) and subpartitioned by departments.
For loading table data, we have used dbms_parallelel_execute and created a task using sql that gets list of departments, level is 20, that is at one time only 20 tasks corresponding to 20 departments will run. Those task will select the department's data from source table and inserts into target table.
Before doing insert, we first get subpartition name and generate the following insert:
INSERT /*+ NO_GATHER_OPTIMIZER_STATISTICS ENABLE_PARALLEL_DML APPEND_VALUES */ into Target_Table subpartition (subpartition_name) values (:B1, :B2, :B3, ....) ;
We read on oracle documentation that specifiying subpartition during insert will lock only that subpartition and append will work . The goal of doing this was to create n jobs that will independently insert into their own give subpartitions. Append itself is working, but when we monitor v$session while loading table data, we see that
BLOCKING_SESSION_STATUS is VALID;
FINAL_BLOCKING_SESSION_STATUS is VALID;
EVENT# is library cache lock
STATE is WAITING,
WAIT_CLASS is Concurrency
From this, we are concluding that one append_values is still locking other sessions to insert to another subpartition, is there something we missed? We have enabled parallel dml, disabled target table's indexes, set skip_unusable_indexes to true, no referential constraints are present in target table, table, partitions and subpartitions are set to nologging.
EDIT: Tested the same thing with another table that is also partitioned, but it doesn't have subpartitions, it is only list partitioned. So instead of subpartition (subpartition_name) inside insert statement there was partition(partition_name) . However, in this case , insert run without sessions waiting for others and no locks were held. I am assuming with subpartitioned interval tables the above won't work.
EDIT2 I have created the same scenario in another database which is also Oracle 19c. Created a table with partitions and subpartitions, set the interval, disabled indexes, set nologging and run the job that inserts into subpartitions. Surprisingly, the insert went without errors and no sessions locking each other. Now I am thinking maybe its some database parameter that should be turned on or changed. Because database versions, table structures, jobs, inserts are the same, but in one it is locking each other, in another it is not.
UPDATE Adding the insert part of the code :
if c_tab_cursor %isopen then
close c_tab_cursor;
end if;
open c_tab_cursor;
loop
fetch c_tab_cursor bulk collect
into v_row limit 100000;
exit when(v_row.count = 0);
forall i in v_row.First .. v_row.Last
insert /*+ NO_GATHER_OPTIMIZER_STATISTICS APPEND_VALUES */ into
Target_Table subpartition(SYS_P68457)
values v_row
(i);
commit;
end loop;
close c_tab_cursor;
Edit3 Adding table info, table is daily partitioned, and each partition has around 150 subpartitions. At the time of writing this, table had total 177845 subpartitions. My other guess is oracle is spending many time to find the right subpartition, which is also arguable because subpartition name is provided during insert.

I'd say it is expected "feature" - when you insert into the same segment. Direct path insert writes data beyond HWM(high water mark) rather than using segment's free space map.
When you commit direct path insert HWM advances, when you rollback HWM stays and data is discarded.
Check Oracle segment parameter "FREELIST", but I'm afraid even this parameter wont help you.
When your inserts touch different subpartitions this should not be happening.
There can be various objects held by library cache lock (maybe due to bug).
IMHO only way how to investigate this would be either to use hanganalyze to check which function in oracle is being blocked or to query P1,P2,P3 parameters of library cache lock and identify which object is blocking parallel run.
PS: I saw bugs like: Only one session could run Java stored procedure at the time because Oracle unnecessarily wanted to hold exclusive lock on some library case object.

v$session reports the wait state at that precise instant that you query it. It's meaningless unless you keep requerying and keep seeing the same thing. Better yet, use v$active_session_history to see Oracle's own 1-second sampling of the wait state. If you see lots of rows with that wait, then it's meaningful.
Assuming that this is meaningful, I would point out that you are using a single row VALUES list and yet are asking for parallel dml. Parallel dml is for multiple row operations, not single row operations. You can use it for an insert-select, for example, but not an insert-values.
If your application is necessarily single-row driven, remove ENABLE_PARALLEL_DML APPEND_VALUES hints. If you are binding arrays to these variables, you can leave the APPEND_VALUES but remove the ENABLE_PARALLEL_DML. For inserts, parallel DML only works with insert-select.
As you clearly intend to have multiple sessions, each loading a separate subpartitions, that's your parallelism right there - you don't need nor want to add another layer of parallelism with PDML.

Related

Optimizing a delete... where query with rownum

I'm working with an application that has a large amount of outdated data clogging up a table in my databank. Ideally, I'd want to delete all entries in the table whose reference date is too old:
delete outdatedTable where referenceDate < :deletionCutoffDate
If this statement were to be run, it would take ages to complete, so I'd rather break it up into chunks with the following:
delete outdatedTable where referenceData < :deletionCutoffDate and rownum <= 10000
In testing, this works suprisingly slowly. The following query, however, runs dramatically faster:
delete outdatedTable where rownum <= 10000
I've been reading through multiple blogs and similar questions on StackOverflow, but I haven't yet found a straightforward description of how/whether using rownum affects the Oracle optimizer when there are other Where clauses in the query. In my case, it seems to me as if Oracle checks
referenceData < :deletionCutoffDate
on every single row, executes a massive Select on all matching rows, and only then filters out the top 10000 rows to return. Is this in fact the case? If so, is there any clever way to make Oracle stop checking the Where clause as soon as it's found enough matching rows?
How about a different approach without so much DML on the table. As a permanent solution for future you could go for table partitioning.
Create a new table with required partition(s).
Move ONLY the required rows from your existing table to the new partitioned table.
Once the new table is populated, add the required constraints and indexes.
Drop the old table.
In future, you would just need to DROP the old partitions.
CTAS(create table as select) is another way, however, if you want to have a new table with partition, you would have to go for exchange partition concept.
First of all, you should read about SQL statement's execution plan and learn how to explain in. It will help you to find answers on such questions.
Generally, one single delete is more effective than several chunked. It's main disadvantage is extremal using of undo tablespace.
If you wish to delete most rows of table, much faster way usially a trick:
create table new_table as select * from old_table where date >= :date_limit;
drop table old_table;
rename table new_table to old_table;
... recreate indexes and other stuff ...
If you wish to do it more than once, partitioning is a much better way. If table partitioned by date, you can select actual date quickly and you can drop partion with outdated data in milliseconds.
At last, paritioning if a way to dismiss 'deleting outdated records' at all. Sometimes we need old data, and it's sad if we delete it by own hands. With paritioning you can archive outdated partitions outside of the database, but connects them when you need to access old data.
This is an old request, but I'd like to show another approach (also using partitions).
Depending on what you consider old, you could create corresponding partitions (optimally exactly two; one current, one old; but you could just as well make more), e.g.:
PARTITION BY LIST ( mod(referenceDate,2) )
(
PARTITION year_odd VALUES (1),
PARTITION year_even VALUES (0)
);
This could as well be months (Jan, Feb, ... Dec), decades (XX0X, XX1X, ... XX9X), half years (first_half, second_half), etc. Anything circular.
Then whenever you want to get rid of old data, truncate:
ALTER TABLE mytable TRUNCATE PARTITION year_even;
delete from your_table
where PK not in
(select PK from your_table where rounum<=...) -- these records you want to leave

Oracle 11g Deleting large amount of data without generating archive logs

I need to delete a large amount of data from my database on a regular basis. The process generates huge volume of archive logs. We had a database crash at one point because there was no storage space available on archive destination. How can I avoid generation of logs while I delete data?
The data to be deleted is already marked as inactive in the database. Application code ignores inactive data. I do not need the ability to rollback the operation.
I cannot partition the data in such a way that inactive data falls in one partition that can be dropped. I have to delete the data with delete statements.
I can ask DBAs to set certain configuration at table level/schema level/tablespace level/server level if needed.
I am using Oracle 11g.
What proportion of the data on the table would be deleted, what volume? Are there any referential integrity constraints to manage or is this table childless?
Depending on the answers , you might consider:
"CREATE TABLE keep_data UNRECOVERABLE AS SELECT * FROM ... WHERE
[keep condition]"
Then drop the original table
Then rename keep_table to original table
Rebuild the indexes (again with unrecoverable to prevent redo),constraints etc.
The problem with this approach is it's a multi-step DDL, process, which you will have a job to make fault tolerant and reversible.
A safer option might be to use data-pump to:
Data-pump expdp to extract the "Keep" data
TRUNCATE the table
Data-pump impdp import of data from step 1, with direct-path
At this point I suggest you read the Oracle manual on Data Pump, particularly the section on Direct Path Loads to be sure this will work for you.
MY preferred option would be partitioning.
Of course, the best way would be TenG solution (CTAS, drop and rename table) but it seems it's impossible for you.
Your only problem is the amount of archive logs and database crash problem. In this case, maybe you could partition your delete statement (for example per 10.000 rows).
Something like:
declare
e number;
i number
begin
select count(*) from myTable where [delete condition];
f :=trunc(e/10000)+1;
for i in 1.. f
loop
delete from myTable where [delete condition] and rownum<=10000;
commit;
dbms_lock.sleep(600); -- purge old archive if it's possible
end loop;
end;
After this operation, you should reorganize your table which is surely fragmented.
Alter the table to set NOLOGGING, delete the rows, then turn logging back on.

Inserting data in a column avoiding duplicates

Lets say i have a query which is fetching col1 after joining multiple tables. I want to insert values of that col1 in a table which is on remote db i.e. i would be using dblink to do that.
Now that col1 would be fetched from 4-5 different db's. There is chances that a value1 fetch from db1 would b in db2 as well. How can i avoid duplicates ?
In my remote db, I have created col1 a primary key. so when inserting, an error would be thrown if there is a duplicate key, end result failing rest of the process. Which i don't want to. I was thiking about 2 approaches
Write a PLSQL script, For each value, determine if value already exists or not. If it doesn't then insert.
Write a PLSQL script and insert and catch the duplicate key exception. The exception would be ignore and it will keep inserting (it doesn't sound that good).
Which approach would you prefer? Is there anything else i can do ?
I would use the MERGE statement and WHEN NOT MATCHED THEN INSERT.
The same merger could also update but it doesn't have to, just leave the update part out.
The different databases can have duplicate primary keys but that doesn't mean the records are duplicates. The actual data may be different in each case. Or the records may represent the same real world thing but at different statuses, Don't know, you haven't provided enough explanation.
The point is, you need much more analysis of why duplicate records can exist and probably a more sophisticated approach to handling collisions. Do you need to take all records (in which case you need a synthetic key)? Or do you take only one instance (so how do you decide precedence)? Other scenarios may exist.
In any case, MERGE or PL/SQL loops are likely to be too crude a solution.
First off, I would suggest that your target database drive all of these inserts because inserting/updating across a database link can create some locking issues and further complicate things especially with multiple databases attempting to access and perform DML on the same table. However if that isn't possible the solutions below will work.
I would fix your primary key problem by including a table look-up on the target table for each row.
INSERT INTO customer#dblink.oracle.com cust
(emp_name,
emp_id)
VALUES
(SELECT
cust.employee_name,
cust.employee_id --primary_key
FROM
source_table st
WHERE NOT EXISTS
(SELECT 1
FROM customer#dblink.oracle.com cust
WHERE cust.employee_id = st.emp_id));
Again, I would not recommend DML transactions across database links unless absolutely necessary as you can sometimes have weird locking behavior.
A PL/SQL procedure or anonymous PL/SQL block could be used to create a bulk processing solution as follows:
CREATE OR REPLACE PROCEDURE send_unique_data
AS
TYPE tab_cust IS TABLE OF customer#dblink.oracle.com%ROWTYPE
INDEX BY PLS_INTEGER;
t_records tab_cust;
BEGIN
SELECT
cust.employee_name,
cust.employee_id --primary_key
BULK COLLECT
INTO t_records
FROM source_table;
FORALL i IN t_records.FIRST...t_records.LAST SAVE EXCEPTIONS
INSERT INTO customer#dblink.oracle.com
VALUES t_records(i);
END send_unique_data;
You can also call the system SQL%BULKEXCEPTIONS collection in case you want to do anything with the records that threw exceptions (such as unique_constraint violations). Be warned that this solution will cause the target table to suffers from performance issues if there are lots of duplicate data attempting to be inserted.

Will inserting half a million entries with the same date value be slowed by a non-unique index on that date column?

I have a cursor that selects all rows in a table, a little over 500,000 rows. Read a row from cursor, INSERT into other table, which has two indexes, neither unique, one numeric, one 'DATE' type. COMMIT. Read next row from Cursor, INSERT...until Cursor is empty.
All my DATE column's values are the same, from a timestamp initialized at the start of the script.
This thing's been running for 24 hours, only posted 464K rows, a little less than 10K rows / hr.
Oracle 11g, 10 processors(!?)
Something has to be wrong. I think it's that DATE index trying to process all these entries with exactly the same value for that column.
Why don't you just do:
insert into target (columns....)
select columns and computed values
from source
commit
?
This slow by slow is doing far more damage to performance than an index that may not make any sense.
Indexes slow down inserts but speed up queries. This is normal.
If it is a problem you can remove the index, insert the rows, then add the index again. This can be faster if you are doing many inserts at once.
The way you are copying the data using cursors seems to be inefficient. You could try a set-based approach instead:
INSERT INTO table1 (x, y, z)
SELECT x, y, z FROM table2 WHERE ...
Committing after every inserted row doesn't make much sense. If you're worried about exceeding undo capacity, for example, you can keep a count of the inserts and issue a commit after every thousand rows.
Updating the indexes will have some impact but that's unavoidable if you can't drop (or disable) while the inserts are performed, but that's just how it goes. I'd expect the commits to have a bigger impact, though I suspect that's a topic with varied opinions.
This assumes you have a good reason for inserting from a cursor rather than as a direct insert into ... select from model.
In general, its often a good idea to delete the indexes before doing a massive insert and then add them back afterwards, so that the db doesnt have to try to update the indexes with each insert. Its been a long while since I've used oracle, but had you tried putting more than one insert statement in a transaction? That should also speed it up.
For operations like this you should look at oracle bulk operations, using FORALL and BULK COLLECT. It will reduce the number of DDL operations on the underlying tables considerably
create or replace procedure fast_proc is
type MyTable is table of source_table%ROWTYPE;
MyTable table;
begin
select * BULK COLLECT INTO table from source_table;
forall x in table.First..table.Last
insert into dest_table values table(x) ;
end;
Agreed on comment that what is killing your time is the 'slow by slow' processing. Copying 500,000 rows should be a matter of minutes.
The single INSERT ... SELECT FROM .... approach would be the best one, provided you have big enough Rollback segments. The database may even automatically apply parallel techniques to a plain SQL statement that it will not do with PL/SQL.
In addition you could look at using the /*+ APPEND */ hint - read up on it and see if it may apply to the situation with your target table.
o use all 10 cores you will need to either use plain parallel SQL, or run 10 copies of your pl/sql block, splitting the source table across the 10 copies.
In Oracle 10 this is a manual task (roll your own parallelism) but Oracle 11.2 introduces DBMS_PARALLEL_EXECUTE.
Failing that, bulking up your fetch / insert using the BULK COLLECT & bulk insert would be the next best option - process in chunks of 1000 or so rows (or larger). Again take a look as to whether DBMS_PARALLEL_EXECUTE may help you, or if you could submit the job in chunks via DBMS_JOB.
(Caveat : I don't have access to anything later than Oracle 10)

What is the fastest way to insert data into an Oracle table?

I am writing a data conversion in PL/SQL that processes data and loads it into a table. According to the PL/SQL Profiler, one of the slowest parts of the conversion is the actual insert into the target table. The table has a single index.
To prepare the data for load, I populate a variable using the rowtype of the table, then insert it into the table like this:
insert into mytable values r_myRow;
It seems that I could gain performance by doing the following:
Turn logging off during the insert
Insert multiple records at once
Are these methods advisable? If so, what is the syntax?
It's much better to insert a few hundred rows at a time, using PL/SQL tables and FORALL to bind into insert statement. For details on this see here.
Also be careful with how you construct the PL/SQL tables. If at all possible, prefer to instead do all your transforms directly in SQL using "INSERT INTO t1 SELECT ..." as doing row-by-row operations in PL/SQL will still be slower than SQL.
In either case, you can also use direct-path inserts by using INSERT /*+APPEND*/, which basically bypasses the DB cache and directly allocates and writes new blocks to data files. This can also reduce the amount of logging, depending on how you use it. This also has some implications, so please read the fine manual first.
Finally, if you are truncating and rebuilding the table it may be worthwhile to first drop (or mark unusable) and later rebuild indexes.
Regular insert statements are the slowest way to get data in a table and not meant for bulk inserts. The following article references a lot of different techniques for improving performance: http://www.dba-oracle.com/oracle_tips_data_load.htm
Drop the index, then insert the rows, then re-create the index.
If dropping the index doesn't speed things up enough, you need the Oracle SQL*Loader:
http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html
Suppose you have taken eid,ename,sal,job. So create a table first as:
SQL>create table tablename(eid number, ename varchar2(20),sal number,job char(10));
Now insert data:-
SQL>insert into tablename values(&eid,'&ename',&sal,'&job');
Check this link
http://www.dba-oracle.com/t_optimize_insert_sql_performance.htm
main points to consider for your
case is to use Append hint as this
will directly append into the table
instead of using freelist. If you can afford to turn off logging than use append with nologging hint to do it
Use a bulk insert instead instead of iterating in PL/SQL
Use sqlloaded to load the data directly into the table if you are getting data from a file feed
Here are my recommendations on fast insert.
Trigger - Disable any triggers associated with a table. Enable after Inserts are complete.
Index - Drop Index and re-create it after your Inserts are complete.
Stale stats - Re-analyze table and index stats.
Index de-fragmentation - Rebuild Index if needed
Use No Logging -Insert using INSERT APPEND (Oracle only). This approach is very risky approach, no redo logs are generated therefore you can’t do a rollback - make a backup of table before you start and don't try on live tables. Check if your db has similar option
Parallel Insert: Running parallel insert will get the job faster.
Use Bulk Insert
Constraints - Not much overhead during inserts but still a good idea to check, if it is still slow after even after step 1
You can learn more on http://www.dbarepublic.com/2014/04/slow-insert.html
Maybe one of your best option is to avoid Oracle as much as possible actually.
I've been baffled by this myself, but very often a Java process can outperform many of the Oracle's utilities which either use OCI (read: SQL Plus) or will take up so much of your time to get right (read: SQL*Loader).
This doesn't prevent you to use specific hints either (like /APPEND/).
I've been pleasantly surprised each time I've turned to that kind of solution.
Cheers,
Rollo

Resources