unique constraint violated error performance - performance

I am inserting in a table millions of records , such operation will need hours or maybe a day. After 2 hours the connection through my pc was disconnected, so I want to repeat the insert from the start.
My Question
which is faster ? truncate the table and repeat it again , or creating a primary key and continue, however an error will be raised because of 'unique constraint violated' for every record that was inserted in the last 2 hours.

Truncating the table (If full refresh)is the best option Hands down. There's also SKIP parameter, if you use Oracle's SQL*Loader utility. Let me explain to some extent!
Also try loading the table with SQL*Loader using DIRECT load option. Which means loading the table by loading into the data blocks, instead of conventional INSERT statements.
By this kind of loading, you can enable UNRECOVERABLE , which means no/less redo log written, so the loading is very fast >70% than conventional INSERT.
But, the downside of this loading is, ALL indexes on this table, except NULL constraints will be made UNUSABLE, before the start of loading, and the data will be loaded. And on SUCCESSFUL completion, SQL*Loader tries to re-enable the index, by rebuilding it. So, if in case my any reason, the loading had interrupted, the error messages will be logged properly, and the index would be left UNUSABLE.
More Details on : Please find Here
(DIRECT/CONVENTIONAL Loading)
Also, using SQL*Loader, you can load using Conventional loading, which means SQL*Loader would generate the chunk of INSERTs using the file , and process it. In this type of loading, all the INDEXES will be left as such, and the table remains unharmed.
If at all any error happens, SQL*Loader will log a SKIP parameter, which means , by next run, if you specify that number, the table will loaded from that point of the file.
More Details on SQL*Loader : Here

Not sure how you are loading your table but this is a classic situation where you should use external table of Oracle.

Related

How to safely update hive external table

I have an external hive table and I would like to refresh the data files on a daily basis. What is the recommended way to do this?
If I just overwrite the files, and if we are unlucky enough to have some other hive queries to execute in parallel against this table, what will happen to those queries? Will they just fail? Or will my HDFS operations fail? Or will they block until the queries complete?
If availability is a concern and space isn't an issue, you can do the following:
Make a synonym for the external table. Make sure all queries use this synonym when accessing the table.
When loading new data, load it to a new table with a different name.
When the load is complete, point the synonym to the newly loaded table.
After an appropriate length of time (long enough for any running queries to finish), drop the previous table.
First of all.. if you are accessing any table it may have two types of locks:
exclusive(if data is getting added) and shared(if data is getting read)..
so if you insert overwrite and add data into the table then at that time if you access the table with other queries, they wont get executed because there will be an exclusive lock on it and once the insert overwrite query completes then you may access the table.
Please refer to the following link:
https://cwiki.apache.org/confluence/display/Hive/Locking

How to rollback or not commit with sql loader [duplicate]

If while loading this file
$ cat employee.txt
100,Thomas,Sales,5000
200,Jason,Technology,5500
300,Mayla,Technology,7000
400,Nisha,Marketing,9500
500,Randy,Technology,6000
501,Ritu,Accounting,5400
using the control file (say) sqlldr-add-new.ctl I came to know all the records are faulty so I want the previously loaded records in that table (those that were loaded yesterday) to be retained if today's had any error. How to handle this exception.
This is my sample ctl file
$ cat sqlldr-add-new.ctl
load data
infile '/home/ramesh/employee.txt'
into table employee
fields terminated by ","
( id, name, dept, salary )
You can't roll back from SQL*Loader, it commits automatically. This is mentioned in the errors parameter description:
On a single-table load, SQL*Loader terminates the load when errors exceed this error limit. Any data inserted up that point, however, is committed.
And there's a section on interrupted loads.
You could attempt to load the data to a staging table, and if it is successful move the data into the real table (with delete/insert into .. select .., or with a partition swap if you have a large amount of data). Or you could use an external table and do the same thing, but you'd need a way to determine if the table had any discarded or rejected records.
try with ERRORS=0.
You could find all explanation here:
http://docs.oracle.com/cd/F49540_01/DOC/server.815/a67792/ch06.htm
ERRORS (errors to allow)
ERRORS specifies the maximum number of insert errors to allow. If the number of errors exceeds the value of ERRORS parameter, SQL*Loader terminates the load. The default is 50. To permit no errors at all, set ERRORS=0. To specify that all errors be allowed, use a very high number.
On a single table load, SQL*Loader terminates the load when errors exceed this error limit. Any data inserted up that point, however, is committed.
SQL*Loader maintains the consistency of records across all tables. Therefore, multi-table loads do not terminate immediately if errors exceed the error limit. When SQL*loader encounters the maximum number of errors for a multi-table load, it continues to load rows to ensure that valid rows previously loaded into tables are loaded into all tables and/or rejected rows filtered out of all tables.
In all cases, SQL*Loader writes erroneous records to the bad filz

Operations on certain tables won't finish

We have a table TRANSMISSIONS(ID, NAME) which behaves funny in the following ways:
The statement to add a foreign key in another table referencing TRANSMISSIONS.ID won't finish
The statement to add a column to TRANSMISSIONS won't finish
The statement to disable/drop a unique constraint won't finish
The statement to disable/drop a trigger won't finish
TRANSMISSION's primary key is ID, there is also a unique constraint on NAME - therefore there are indexes on ID and NAME. We also have a trigger which creates values for column ID using a sequence, so that INSERT statements do not need to provide a value for ID.
Besides TRANSMISSIONS, there are two more tables behaving like this. For other tables, the above-mentioned statements work fine.
The database is used in an application with Hibernate and due to an incorrect JPA configuration we produced high values for ID during a time. Note that we use the trigger only for "manual" INSERT statements and that Hibernate produces ID values itself, also using the sequence.
The first thought was that the problems were due to the high IDs but we have the problems also with tables that never had such high IDs.
Anyways we suspected that the indexes might be fragmented somehow and called ALTER INDEX TRANSMISSIONS_PK SHRINK SPACE COMPACT, which ran through but showed no effect.
Also, we wanted to call ALTER TABLE TRANSMISSIONS SHRINK SPACE COMPACT which didn't work because we needed to call first ALTER TABLE TRANSMISSIONS ENABLE ROW MOVEMENT which never finished.
We have another instance of the database which does not behave in such a funny way. So we think it might be that in the course of running the application the database got somehow into an inconsistent state.
Does someone have any suggestions what might have gone out of control/into an inconsitent state?
More hints:
There are no locks present on any objects in the database (according to information in v$lock and v$locked_object)
We tried all these statements in SQL Developer and also using SQLPlus (command-line tool).

How to tune Oracle's SQL*Loader append?

I am writing a Java program that creates a CSV file with 6,800,000 records conforming to specific distribution parameters and populates a table using Oracle's SQL*Loader.
I am testing my program using different sizes of records (50,000 and 500.000). The CSV File generation by itself is quite fast, using concurrency it takes miliseconds to create and insert these records into a file.
Inserting said records, on the other hand, is taking too long. Reading the log file generated by SQL*Loader, it takes 00:00:32.90 seconds to populate the table with 50,000 records and 00:07:58.83 minutes to populate it with 500,000.
SQL*Loader benchmarks I've googled show much better perfomances, such as 2 million rows in less than 2 minutes. I've followed this tutorial to improve the time, but it barely changed at all. There's obviously something wrong here, but I don't know what.
Here's my control file:
OPTIONS (SILENT=ALL, DIRECT=TRUE, ERRORS=50, COLUMNARRAYROWS=50000, STREAMSIZE=500000)
UNRECOVERABLE LOAD DATA
APPEND
INTO TABLE MY_TABLE
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
...
Another important info: I've tried using PARALLEL=TRUE, but I get the ORA-26002 error (Table MY_TABLE has index defined upon it). Unfortunatly, running with skip_index_maintenance renders the index UNUSABLE.
What am I doing wrong?
Update
I have noticed that soon after running the program (less than a second), all rows are already present in the database. Yet, SQL*Loader is still busy and only finishes after 32-45 seconds.
What could it be doing?
One thought would be to create an external table and set the name to the csv file. Then after creating the file you can run a sql script inside Oracle to process the data directly.
Or, look at the following (copied from here:)
This issue is caused when using the bulk load option in parallel to load an Oracle target that has an index on it. An Oracle limitation.
To resolve this issue do one of the following:
· Change the target load option to Normal.
· Disable the enable parallel mode option in relational connection browser.
· Drop the indexes before loading.
· Or create a pre- and post-session sql to drop and create indexes and key constraints

sqlldr corrupts my primary key after the first commit

Sqlldr is corrupting my primary key index after the first commit in my ctl file. After the first, no matter what I set the rows value to in my control file, I get:
ORA-39776: fatal Direct Path API error loading table PE_OWNER.CLINICAL_CODE
ORA-01502: index 'PE_OWNER.CODE_PK' or partition of such index is in unusable state
SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
I'm using Oracle database and client 11.1.0.6.0.
I know the issue is not due to duplicate rows because if I set the rows directive to a huge value, the index is not corrupt after sqlldr does a single commit for the entire file. This provides me with a workaround, but it's still a little alarming...
Thanks for any guidance anyone can give.
I don't use SQL*Loader much on production tables, but from what I've read, you need to use conventional load.
from the SQL*Loader documentation
When to Use a Conventional Path Load
If load speed is most important to
you, you should use direct path load
because it is faster than conventional
path load. However, certain
restrictions on direct path loads may
require you to use a conventional path
load. You should use a conventional
path load in the following situations:
* When accessing an indexed table concurrently with the load, or when
applying inserts or updates to a nonindexed table concurrently with the
load
To use a direct path load (with the exception of parallel loads),
SQL*Loader must have exclusive write
access to the table and exclusive
read/write access to any indexes.
I believe the issue was that Oracle did not have time to rebuild the indices on the table in question, so I increased the batch commit size to a number larger than the number of records I was importing.
That fixed the issue.

Resources