Questions on SQL* Loader commit range and display - oracle

As per my observation, when we load the data to Oracle tables using SQL* Loader, 64 records are committed at once, by default.
Could you please let me know if we can change this default limit of insertion/commit to number other than 64?
Also, can we avoid the display of loading, shown below on the console?
Commit point reached - logical record count 64
Commit point reached - logical record count 128
Commit point reached - logical record count 192

From the reference:
rows -- number of rows in conventional path bind array or between direct
path data saves
(Default: Conventional path 64, Direct path all)
So you'd specify rows=1024 or whatever on the command line or parameter file.
As for avoiding the display, I don't think there's an option to suppress only the progress information. There is silent but that might be more than you want. Filter those out with grep if you don't want to see them.

Related

Oracle database performance :- How to calculate read response time for a single block and for multiple blocks for oracle database 12 c?

How to calculate read response time for a single block and for multiple blocks for oracle database 12c?
Is there a metric view present where we can see information pertaining to blocks?
like v$stat and v$sysmetric?
I looked around on different websites including oracle's but could not find much info except 'Average Synchronous Single-Block Read Latency' present in v$sysmetric view of my database.
Are Average Synchronous Single-Block Read Latency and single block read response time same?
you can use V$filestat , this view table containes the number of physical reads and writes happend on logical and block row also you can check
select file#, phyrds,
phywrts
from v$filestat
you can do a select on v$datafile to get the name of the data file
select * from v$datafile where file#=(file number get it from filestat)
from the doc
This view displays the number of physical reads and writes done and
the total number of single-block and multiblock I/Os done at file
level. As of Oracle Database 10g Release 2 (10.2), this view also
includes reads done by RMAN processes for backup operations.

SSIS Lookup transformation not finding matches

I have a Lookup transformation that does not seem to be finding obvious matches. I have an input file that has 43 records that includes the same CustomerID which is set as an 8 byte-Signed Integer. I am using the Lookup to see if the CustomerID already exist in my destination table. In the destination table the CustomerID is defined as BigInt.
For testing, I truncated the Lookup(destination) table. I have tried all three Cache settings with the same results.
When I run the SSIS package, all 43 records are sent through the No Match Output side. I would think that only the 1st record should go that direction and all others would be considered as a match since they have the same CustomerID. Additionally, if I run the job a second time(without truncating the destination) then they are all flagged as Matched.
It seems as if the cache is not being looked at in the Lookup. Ultimately I want the NO Match records to be written to the Destination table and the Matched records to have further processing.
Any ideas?
Lookup transformation is working as expected. I am not sure what's your understanding of look up is, so I'll go point by point.
For testing, I truncated the Lookup(destination) table. I have tried
all three Cache settings with the same results.
When I run the SSIS package, all 43 records are sent through the No
Match Output side
Above behavior is expected. After truncate, lookup is essentially trying to find those 43 records within your truncated destination table. Since it can't find any, it is flagging them as new records ie No match output side.
if I run the job a second time(without truncating the destination)
then they are all flagged as Matched
In this case, all those 43 records from file are matched within destination table, hence lookup refers them as duplicates and thus they are flagged as Matched output
I am using the Lookup to see if the CustomerID already exist in my
destination table
To achieve this, all you need to do is send Matched output to some staging table which can be periodically truncated(as they are duplicate). and all the No match output can be send to your destination table.
You can post screenshot of your lookup as well in case you want further help.
The lookup can't be used this way. SSIS dataflows execute in a transaction. So while the package is running, no rows have been written to the destination until the entire dataflow runs. So regardless of the Cache setting, the new rows being sent to your destination table are not going to be considered by the Lookup while it's running. Then when you run it again, the rows will be considered. This is expected behavior.

Doing String length on SQL Loader input field

I'm reading data from a fixed length text file and loading into a table with fixed length processing.
I want to check the input line length so that i'd discard the records that are not matching the fixed length and logging them into an Error Table.
Example
Load into Input_Log table if line is meeting the specified length
Load into Input_Error_Log table if the input line length is less than or greater than the fixed line length.
I believe you would be better served by bulk loading your data into a staging table, then load into the production table from there via a stored procedure where you can apply rules via normal PL/SQL & DML to your heart's content. This is a typical best practice anyway.
sqlldr isn't really the tool to get too complicated in, even if you could do what you want. Maintainability and restart-ability become more complicated when you add complexity to a tool that's really designed for bulk loading. Add the complexity to a proper program.
Let us know what you come up with.

Oracle SQL*loader running in direct mode is much slower than conventional path load

In the past few days I've playing around with Oracle's SQL*Loader in attempt to bulk load data into Oracle. After trying out different combination of options I was surprised to found the conventional path load runs much quicker than direct path load.
A few facts about the problem:
Number of records to load is 60K.
Number of records in target table, before load, is 700 million.
Oracle version is 11g r2.
The data file contains date, character (ascii, no conversion required), integer, float. No blob/clob.
Table is partitioned by hash. Hash function is same as PK.
Parallel of table is set to 4 while server has 16 CPU.
Index is locally partitioned. Parallel of index (from ALL_INDEXES) is 1.
There's only 1 PK and 1 index on target table. PK constraint built using index.
Check on index partitions revealed that records distribution among partitions are pretty even.
Data file is delimited.
APPEND option is used.
Select and delete of the loaded data through SQL is pretty fast, almost instant response.
With conventional path, loading completes in around 6 seconds.
With direct path load, loading takes around 20 minutes. The worst run takes 1.5 hour to
complete yet server was not busy at all.
If skip_index_maintenance is enabled, direct path load completes in 2-3 seconds.
I've tried quite a number of options but none of them gives noticeable improvement... UNRECOVERABLE, SORTED INDEXES, MULTITHREADING (I am running SQL*Loader on a multiple CPU server). None of them improve the situation.
Here's the wait event I kept seeing during the time SQL*Loader runs in direct mode:
Event: db file sequential read
P1/2/3: file#, block#, blocks (check from dba_extents that it is an index block)
Wait class: User I/O
Does anyone has any idea what has gone wrong with direct path load? Or is there anything I can further check to really dig the root cause of the problem? Thanks in advance.
I guess you are falling fowl of this
"When loading a relatively small number of rows into a large indexed table
During a direct path load, the existing index is copied when it is merged with the new index keys. If the existing index is very large and the number of new keys is very small, then the index copy time can offset the time saved by a direct path load."
from When to Use a Conventional Path Load in: http://download.oracle.com/docs/cd/B14117_01/server.101/b10825/ldr_modes.htm

Upload DB2 data into an Oracle database - fixing junk data

I've been given a DB2 export of data (around 7 GB) with associated DB2 control files. My goal is to upload all of the data into an Oracle database. I've almost succeeded in this - I took the route of converting the control files into SQL*Loader CTL files and it has worked for the most part.
However, I have found some of the data files contain terminators and junk data in some of the columns, which is loaded into the database, causing obvious issues with matching on that data. E.g., A column should contain '9930027130', will show length(trim(col)) = 14 : 4 Bytes of junk data.
My question is, what is the best way to eliminate this junk data from the system? I hope theres a simple addition to the CTL file that allows it to replace the junk with spaces - otherwise I can only think of writing a script that analyses the data and replaces nulls/junk with spaces before running SQL*Loader.
What, exactly, is your definition of "junk"?
If you know that a column should only contain 10 characters of data, for example, you can add a NULLIF( LENGTH( <<column>> ) > 10 ) to your control file. If you know that the column should only contain numeric characters (or alphanumerics), you can write a custom data cleansing function (i.e. STRIP_NONNUMERIC) and call that from your control file, i.e.
COLUMN_NAME position(1:14) CHAR "STRIP_NONNUMERIC(:LAST_NAME)",
Depending on your requirements, these cleansing functions and the cleansing logic can get rather complicated. In data warehouses that are loading and cleansing large amounts of data every night, data is generally moved through a series of staging tables as successive rounds of data cleansing and validation rules are applied rather than trying to load and cleanse all the data in a single step. A common approach would be, for example, to load all the data into VARCHAR2(4000) columns with no cleansing via SQL*Loader (or external tables). Then you'd have a separate process move the data to a staging table that has the proper data types NULL-ing out data that couldn't be converted (i.e. non-numeric data in a NUMBER column, impossible dates, etc.). Another process would come along and move the data to another staging table where you apply domain rules-- things like a social security number has to be 9 digits, a latitude has to be between -90 and 90 degrees, or a state code has to be in the state lookup table. Depending on the complexity of the validations, you may have more processes that move the data to additional staging tables to apply ever stricter sets of validation rules.
"A column should contain '9930027130', will show length(trim(col)) = 14 : 4 Bytes of junk data. "
Do a SELECT DUMP(col) to determine the strange characters. Then decide whether that are always invalid, valid in some cases or valid but interpreted wrong.

Resources