How to avoid concurrency error for multiple files loading into a single table in ODI 12c - parallel-processing

I have a scenario where there are 3 files of 5 million lines each with 3 weeks of data is bulk mapped to a staging table. They run parallelly. If the data transfer for a file fails with a concurrency error, what will be the best way to load the data of the 3 files effectively into the stage table.
(I was asked this in an interview.)

In ODI 12c, Parallel Target Table Load can be achieved by selecting the Use Unique Temporary Object Names checkbox in the Physical tab of a mapping. The work tables for each concurrent session will have a unique name.
I discuss it in more details in the Parallel Target Table Load section of this blog on ODI 12c new features

Related

Oracle Advanced Queues versus a Small Oracle Database Table

I'm looking for a simple way to communicate between two databases, there currently exists a database link between both database.
I want to process a job on database 1 for a batch of records (batch code for each batch of records), once the process has finished on database 1 and all the batches of records have been processed. I want database 2 to see that database 1 has processed a number of batches (batch codes) either by querying a oracle table or an Oracle advanced queue which sits on either database 1 or database 2.
Database 2 will process the batches of records that are on database 1 through a database linked view using each batch code and update the status of that batch to complete.
I want to be able to update the Oracle Advanced Queue or database table of its batch no, progress status ('S' started, 'C' completed), status date
Table name.
batch_records
Table columns
Batch No,
Status,
status date
Questions:
Can this be done by a simple database table rather than a complex Oracle Advanced Queue?
Can a table be updated over a database link?
Are there any examples of this?
To answer your question first:
yes, I believe so
yes, it can. But, if there are many rows involved, it can be pretty slow
probably
Database link is the way to communicate between two databases. If those jobs run on the database 1 (DB1), I'd suggest you to keep it there - in the DB1. Doing stuff over a database link calls for problems of different kinds. Might be slow, you can't do everything over the database link (LOBs, for example). One option is to schedule a job (using DBMS_SCHEDULER or DBMS_JOB (which is quite OK for simple things)). Let the procedure maintain job status in some table (that would be a "simple table" from your 1st question) in DB1 which will be read by the DB2.
How? Do it directly, or create a materialized view which will be refreshed in a scheduled manner (e.g. every morning at 07:00) or on demand (not that good idea) or on commit (once the DB1 procedure does the job and commits changes, materialized view will be refreshed).
If there aren't that many rows involved, I'd probably read the DB1 status table directly, and think of other options later (if necessary).

Oracle Data Integrator- ODI 12.2.1--Loadplan Issue no of records count issue

I come across a scenario in my project.I am loading data from file to Table using ODI.I am running My interfaces through loadplan.I've 1000 Records in my source file,and also getting 1000 records in target file.but when I'm checking ODI loadplan execution log its showing number of insert is 2000.can anyone please help.or is it a ODI bug.?
The number of inserts does not only show the inserts in the target table but also all the insert happening in temporary tables. Depending on the knowledge modules (KMs) used in an interface, ODI might load data in a C$_ table (LKM) or I$_ table (IKM/CKM). The rows loaded in these table will also be counted.
You can look at the code generated in the operator to check if your KMs are using using these temporary. You can also simulate an execution to see the code generated.

How to extract data from multiple databases and replicate to one database having different table structure using Oracle Goldengate?

I have 6 tables to extract data from each one of the 4 databases. I have to replicate all that data in 6 tables of single database. Target tables have just one extra column 'instance_id' which shows that we are getting data from which Database. Now I have one extract process for each database and 4 replicate process in target database. I want to update 'instance_id' column automatically as soon as row is entered in target table using OGG replication. I know there is SQLEXEC statement which can run SQL queries in OGG. I don't know where and how to use it to solve my problem here.
If you have 4 sources, you have 4 sets of trail files and 4 replicats. In the replicats include your instance_id in the column MAP. Also - if getting data from the 4 sources is going to cause primary key collisions, you will have to include instance_id in your PK definition. Would look something like:
MAP schema.table, TARGET schema.table,
COLMAP(USEDEFAULTS, instance_id = 1),
KEYCOLS(pkcol, instance_id);

Can EF6 use oracle database links?

We have a legacy app that I am rewriting in .net. All of our databases are oracle and make use of database links. Is there any way for Entity Framework 6 to generate models based on tables located on a different database?
Currently the legacy app gets data from table like this
SELECT * FROM emp#foo2;
where its db connection is to database foo that has a database link to the database foo2.
I would like to reproduce this using EF6. So far all I have found regarding this is this question.
You can do two things that EF 4 or higher will work with:
CREATE VIEW EMP as SELECT * FROM emp#foo2;
CREATE MATERIALIZED VIEW EMP as SELECT * FROM emp#foo2;
LOBS are not accessible across a database link without some contorted PL/SQL processing to read the LOB piece by piece.
I believe fast refresh does not work across database links so you must consider the size of the table on the linked database. If you are refreshing a million rows you may find the time to do this is an issue. Most large tables are full of tombstone data that never changes so a timestamp column with the last modified date could help you create a package that only picks out the changed data.
If you are doing complicated joins for either ensure that Oracle considers the column that would be the primary key as not null.
You can add a primary key on views and materialized view but it must be disabled. See here for details.

How to tune Oracle's SQL*Loader append?

I am writing a Java program that creates a CSV file with 6,800,000 records conforming to specific distribution parameters and populates a table using Oracle's SQL*Loader.
I am testing my program using different sizes of records (50,000 and 500.000). The CSV File generation by itself is quite fast, using concurrency it takes miliseconds to create and insert these records into a file.
Inserting said records, on the other hand, is taking too long. Reading the log file generated by SQL*Loader, it takes 00:00:32.90 seconds to populate the table with 50,000 records and 00:07:58.83 minutes to populate it with 500,000.
SQL*Loader benchmarks I've googled show much better perfomances, such as 2 million rows in less than 2 minutes. I've followed this tutorial to improve the time, but it barely changed at all. There's obviously something wrong here, but I don't know what.
Here's my control file:
OPTIONS (SILENT=ALL, DIRECT=TRUE, ERRORS=50, COLUMNARRAYROWS=50000, STREAMSIZE=500000)
UNRECOVERABLE LOAD DATA
APPEND
INTO TABLE MY_TABLE
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
...
Another important info: I've tried using PARALLEL=TRUE, but I get the ORA-26002 error (Table MY_TABLE has index defined upon it). Unfortunatly, running with skip_index_maintenance renders the index UNUSABLE.
What am I doing wrong?
Update
I have noticed that soon after running the program (less than a second), all rows are already present in the database. Yet, SQL*Loader is still busy and only finishes after 32-45 seconds.
What could it be doing?
One thought would be to create an external table and set the name to the csv file. Then after creating the file you can run a sql script inside Oracle to process the data directly.
Or, look at the following (copied from here:)
This issue is caused when using the bulk load option in parallel to load an Oracle target that has an index on it. An Oracle limitation.
To resolve this issue do one of the following:
· Change the target load option to Normal.
· Disable the enable parallel mode option in relational connection browser.
· Drop the indexes before loading.
· Or create a pre- and post-session sql to drop and create indexes and key constraints

Resources