sqlldr corrupts my primary key after the first commit - oracle

Sqlldr is corrupting my primary key index after the first commit in my ctl file. After the first, no matter what I set the rows value to in my control file, I get:
ORA-39776: fatal Direct Path API error loading table PE_OWNER.CLINICAL_CODE
ORA-01502: index 'PE_OWNER.CODE_PK' or partition of such index is in unusable state
SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
I'm using Oracle database and client 11.1.0.6.0.
I know the issue is not due to duplicate rows because if I set the rows directive to a huge value, the index is not corrupt after sqlldr does a single commit for the entire file. This provides me with a workaround, but it's still a little alarming...
Thanks for any guidance anyone can give.

I don't use SQL*Loader much on production tables, but from what I've read, you need to use conventional load.
from the SQL*Loader documentation
When to Use a Conventional Path Load
If load speed is most important to
you, you should use direct path load
because it is faster than conventional
path load. However, certain
restrictions on direct path loads may
require you to use a conventional path
load. You should use a conventional
path load in the following situations:
* When accessing an indexed table concurrently with the load, or when
applying inserts or updates to a nonindexed table concurrently with the
load
To use a direct path load (with the exception of parallel loads),
SQL*Loader must have exclusive write
access to the table and exclusive
read/write access to any indexes.

I believe the issue was that Oracle did not have time to rebuild the indices on the table in question, so I increased the batch commit size to a number larger than the number of records I was importing.
That fixed the issue.

Related

How to safely update hive external table

I have an external hive table and I would like to refresh the data files on a daily basis. What is the recommended way to do this?
If I just overwrite the files, and if we are unlucky enough to have some other hive queries to execute in parallel against this table, what will happen to those queries? Will they just fail? Or will my HDFS operations fail? Or will they block until the queries complete?
If availability is a concern and space isn't an issue, you can do the following:
Make a synonym for the external table. Make sure all queries use this synonym when accessing the table.
When loading new data, load it to a new table with a different name.
When the load is complete, point the synonym to the newly loaded table.
After an appropriate length of time (long enough for any running queries to finish), drop the previous table.
First of all.. if you are accessing any table it may have two types of locks:
exclusive(if data is getting added) and shared(if data is getting read)..
so if you insert overwrite and add data into the table then at that time if you access the table with other queries, they wont get executed because there will be an exclusive lock on it and once the insert overwrite query completes then you may access the table.
Please refer to the following link:
https://cwiki.apache.org/confluence/display/Hive/Locking

How to tune Oracle's SQL*Loader append?

I am writing a Java program that creates a CSV file with 6,800,000 records conforming to specific distribution parameters and populates a table using Oracle's SQL*Loader.
I am testing my program using different sizes of records (50,000 and 500.000). The CSV File generation by itself is quite fast, using concurrency it takes miliseconds to create and insert these records into a file.
Inserting said records, on the other hand, is taking too long. Reading the log file generated by SQL*Loader, it takes 00:00:32.90 seconds to populate the table with 50,000 records and 00:07:58.83 minutes to populate it with 500,000.
SQL*Loader benchmarks I've googled show much better perfomances, such as 2 million rows in less than 2 minutes. I've followed this tutorial to improve the time, but it barely changed at all. There's obviously something wrong here, but I don't know what.
Here's my control file:
OPTIONS (SILENT=ALL, DIRECT=TRUE, ERRORS=50, COLUMNARRAYROWS=50000, STREAMSIZE=500000)
UNRECOVERABLE LOAD DATA
APPEND
INTO TABLE MY_TABLE
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
...
Another important info: I've tried using PARALLEL=TRUE, but I get the ORA-26002 error (Table MY_TABLE has index defined upon it). Unfortunatly, running with skip_index_maintenance renders the index UNUSABLE.
What am I doing wrong?
Update
I have noticed that soon after running the program (less than a second), all rows are already present in the database. Yet, SQL*Loader is still busy and only finishes after 32-45 seconds.
What could it be doing?
One thought would be to create an external table and set the name to the csv file. Then after creating the file you can run a sql script inside Oracle to process the data directly.
Or, look at the following (copied from here:)
This issue is caused when using the bulk load option in parallel to load an Oracle target that has an index on it. An Oracle limitation.
To resolve this issue do one of the following:
· Change the target load option to Normal.
· Disable the enable parallel mode option in relational connection browser.
· Drop the indexes before loading.
· Or create a pre- and post-session sql to drop and create indexes and key constraints

unique constraint violated error performance

I am inserting in a table millions of records , such operation will need hours or maybe a day. After 2 hours the connection through my pc was disconnected, so I want to repeat the insert from the start.
My Question
which is faster ? truncate the table and repeat it again , or creating a primary key and continue, however an error will be raised because of 'unique constraint violated' for every record that was inserted in the last 2 hours.
Truncating the table (If full refresh)is the best option Hands down. There's also SKIP parameter, if you use Oracle's SQL*Loader utility. Let me explain to some extent!
Also try loading the table with SQL*Loader using DIRECT load option. Which means loading the table by loading into the data blocks, instead of conventional INSERT statements.
By this kind of loading, you can enable UNRECOVERABLE , which means no/less redo log written, so the loading is very fast >70% than conventional INSERT.
But, the downside of this loading is, ALL indexes on this table, except NULL constraints will be made UNUSABLE, before the start of loading, and the data will be loaded. And on SUCCESSFUL completion, SQL*Loader tries to re-enable the index, by rebuilding it. So, if in case my any reason, the loading had interrupted, the error messages will be logged properly, and the index would be left UNUSABLE.
More Details on : Please find Here
(DIRECT/CONVENTIONAL Loading)
Also, using SQL*Loader, you can load using Conventional loading, which means SQL*Loader would generate the chunk of INSERTs using the file , and process it. In this type of loading, all the INDEXES will be left as such, and the table remains unharmed.
If at all any error happens, SQL*Loader will log a SKIP parameter, which means , by next run, if you specify that number, the table will loaded from that point of the file.
More Details on SQL*Loader : Here
Not sure how you are loading your table but this is a classic situation where you should use external table of Oracle.

Attempting to use SQL-Developer to analyze a system table dump created with 'exp'

I'm attempting to recover the data from a specific table that exists in a system table dump I performed earlier. I would like to append the rows existing in the dump to any rows that may exist in the active table. The problem is, it's likely that the name of the table in the dump is not the same as what exists in the database currently (They're dynamically created with a prefix of ARC_TREND_). In addition, I don't know the name of the table as it exists in the dump, I was hoping to use SQL Developer to analyze the dump file as I can recognize the correct table by it's columns and it's existing rows.
While i'm going on blind faith that SQL Developer can work with my dump file, when attempting to open it, i'm getting a Java Heap OutOfMemory exception raised. I've adjusted the maximum heap size from 640m to 1024m in both sqldeveloper.bat and in sqldeveloper.conf, but to no avail.
Can someone recommend a course of action for me to take to recover the data from a table which exists in a exp created dump file? A graphical tool would be nice, but I'm no stranger to the command line. I need to analyze the tables that exist in the dump in order to pick the correct one out. Then I assume I can use imp TABLE= to bring it back into the active instance. It likely won't match the existing table name, so I will use SQL Developer to copy the rows from the imported table to the table where I need them to be.
The dump was taken from a Linux server running 10g, and will be imported to (the same server & database instance, upgraded) an 11g instance of the same database.
Thanks
Since you're referring to imp rather than impdp, I assume this wasn't exported with data pump. Either way, I doubt you'll get anything useful through SQL Developer.
Fortunately most of what you're trying to do is quite easy from the command line; just run imp with the INDEXFILE parameter, which will give you a text file containing all the table (commented out with REM) and index creation commands. From that you should be able to spot the table from its column names.
You can't really see any row data though, so if there's more than one possible match you might need to import several tables and inspect the data in them in the database to see which one you really want.

PostgreSQL temporary tables

I need to perform a query 2.5 million times. This query generates some rows which I need to AVG(column) and then use this AVG to filter the table from all values below average. I then need to INSERT these filtered results into a table.
The only way to do such a thing with reasonable efficiency, seems to be by creating a TEMPORARY TABLE for each query-postmaster python-thread. I am just hoping these TEMPORARY TABLEs will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course.
I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)
Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See ON COMMIT.
Temporary table are, however, dropped at the end of a database session:
Temporary tables are automatically dropped at the end of a session, or
optionally at the end of the current transaction.
There are multiple considerations you have to take into account:
If you do want to explicitly DROP a temporary table at the end of a transaction, create it with the CREATE TEMPORARY TABLE ... ON COMMIT DROP syntax.
In the presence of connection pooling, a database session may span multiple client sessions; to avoid clashes in CREATE, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the ON COMMIT DROP creation syntax), or on an as-needed basis (by preceding any CREATE TEMPORARY TABLE statement with a corresponding DROP TABLE IF EXISTS, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)
While the temporary table is in use, how much of it will fit in memory before overflowing on to disk? See the temp_buffers option in postgresql.conf
Anything else I should worry about when working often with temp tables? A vacuum is recommended after you have DROPped temporary tables, to clean up any dead tuples from the catalog. Postgres will automatically vacuum every 3 minutes or so for you when using the default settings (auto_vacuum).
Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an ANALYZE on the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.
Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests.
EDIT:
If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.

Resources