How to extract data from multiple databases and replicate to one database having different table structure using Oracle Goldengate? - oracle

I have 6 tables to extract data from each one of the 4 databases. I have to replicate all that data in 6 tables of single database. Target tables have just one extra column 'instance_id' which shows that we are getting data from which Database. Now I have one extract process for each database and 4 replicate process in target database. I want to update 'instance_id' column automatically as soon as row is entered in target table using OGG replication. I know there is SQLEXEC statement which can run SQL queries in OGG. I don't know where and how to use it to solve my problem here.

If you have 4 sources, you have 4 sets of trail files and 4 replicats. In the replicats include your instance_id in the column MAP. Also - if getting data from the 4 sources is going to cause primary key collisions, you will have to include instance_id in your PK definition. Would look something like:
MAP schema.table, TARGET schema.table,
COLMAP(USEDEFAULTS, instance_id = 1),
KEYCOLS(pkcol, instance_id);

Related

Change schema in an Impala/Hive table with a very large amount of data?

We have a Hive table stored on HDFS with 800+ columns and >65 billion rows (and growing) and need to:
Remove a column with a complex type (small array)
Add a column with a complex type (small array)
Possibly add a handful of other columns (simple type, e.g. string or int)
Modify the contents of 3 columns for every row in the database (effectively read it in, make a simple change, write it back out to the same column and row that it came from). I realise this is probably a separate operation to the other three requirements above.
We could set up a new empty table with the new schema and copy the data over (using CREATE TABLE xxxxx FROM SELECT ... or INSERT INTO xxxx SELECT ...) but tests suggest it would take 1 - 3 weeks running non stop. And it's possible we may need to make further minor similar modifications in future.
Is there an efficient, sensible alternative to copying the whole table? Would ALTER TABLE work (at least for the structural changes, items 1 - 3 above)? What are the pros and cons of either option(s)?
Table is going to be queried using Impala, in a Zeppelin-based interface.
Thanks for any advice.

How to avoid concurrency error for multiple files loading into a single table in ODI 12c

I have a scenario where there are 3 files of 5 million lines each with 3 weeks of data is bulk mapped to a staging table. They run parallelly. If the data transfer for a file fails with a concurrency error, what will be the best way to load the data of the 3 files effectively into the stage table.
(I was asked this in an interview.)
In ODI 12c, Parallel Target Table Load can be achieved by selecting the Use Unique Temporary Object Names checkbox in the Physical tab of a mapping. The work tables for each concurrent session will have a unique name.
I discuss it in more details in the Parallel Target Table Load section of this blog on ODI 12c new features

How do i get table count for all tables in the same folder in HADOOP hive? if in SAS server?

I want to get the table count for all tables under a folder called "planning" in HADOOP hive database but i couldn't figure out a way to do so. Most of these tables are not inter-linkable and hence cant use full join with common key.
Is there a way to do table count and output to 1 table with each row of record represent 1 table name?
Table name that i have:
add_on
sales
ppu
ssu
car
Secondly, I am a SAS developer. Is the above process do-able in SAS? I tried data dictionary but "nobs" is totally blank for this library. All other SAS datasets can display "nobs" properly. I wonder why and how.

oracle synchronize 2 tables

I have the following scenario and need to solve it in ORACLE:
Table A is on a DB-server
Table B is on a different server
Table A will be populated with data.
Whenever something is inserted to Table A, i want to copy it to Table B.
Table B nearly has similar columns, but sometimes I just want to get
the content from 2 columns from tableA and concatenate it and save it to
Table B.
I am not very familiar with ORACLE, but after researching on GOOGLE
some say that you can do it with TRIGGERS or VIEWS, how would you do it?
So in general, there is a table which will be populated and its content
should be copien to a different table.
This is the solution I came up so far
create public database link
other_db
connect to
user
identified by
pw
using 'tns-entry';
CREATE TRIGGER modify_remote_my_table
AFTER INSERT ON my_table
BEGIN INSERT INTO ....?
END;
/
How can I select the latest row that was inserted?
If the databases of these two tables are in two different servers, then you will need a database link (db-link) to be created in Table A schema so that it can access(read/write) the Table B data using db-link.
Step 1: Create a database link in Table A server db pointing to Table B server DB
Step 2: Create a trigger for Table A, which helps in inserting data to the table B using database link. You can customize ( concatenate the values) inside the trigger before inserting it into table B.
This link should help you
http://searchoracle.techtarget.com/tip/How-to-create-a-database-link-in-Oracle
Yes you can do this with triggers. But there may be a few disadvantages.
What if database B is not available? -> Exception handling in you trigger.
What if database B was not available for 2h? You inserted data into database A which is now missing in database B. -> Do crazy things with temporarily inserting it into a cache table in database A.
Performance. Well, the performance for inserting a lot of data will be ugly. Each time you insert data, Oracle will start the PL/SQL engine to insert the data into the remote database.
Maybe you could think about using MViews (Materialized Views) to replicate the data via database link. Later you can build your queries so that they access tables from database B and add the required data from database A by joining the MViews.
You can also use fast refresh to replicate the data (almost) realtime.
From perspective of an Oracle Database Admin this would make a lot more sense than the trigger approach.
try this code
database links are considered rather insecure and oracle own options are having licences associated these days, some of the other options are deprecated as well.
https://gist.github.com/anonymous/e3051239ba401e416565cdd912e0de8c
uses ora_rowscn to sync tables across two different oracle databases.

Whats the best way to perform selective record replication at an Oracle database

Suppose the following scenario:
I have a master database that contains lots of data, in this database I have a key table that I'm going to call DataOwners for this example, the DataOwners table has 4 records, each record of each of the other tables in the database "belongs" directly or indirectly to a record of the DataOwners, and by belongs I mean is linked to it with foreign keys.
I also have other 2 slave databases with the exact same structure from my master database whose are only updated through replication from my master database, but SlaveDatabase1 should only have records from DataOwner 2 and SlaveDatabase2 should only have records from DataOwners 1 and 3 whereas MasterDatabase has records of DataOwners 1, 2, 3 and 4.
Is there any tool for Oracle that allows me to do this kind of selective record replication?
If not, is there any way to improve my replication method? which is:
add to each table a trigger that inserts the record changes in a group of replication tables
execute the commands of the replication tables at selected slaves
The simplest option would be to define materialized views in the various slave databases that replicate just the data that you want. So, for example, if there is a table A in the master database, then in slave database 1, you'd create a materialized view
CREATE MATERIALIZED VIEW a
<<refresh criteria>>
AS
SELECT a.*
FROM a#to_master a,
dataOwners#to_master dm
WHERE a.dataOwnerID = dm.dataOwnerID
AND dm.some_column = <<some criteria that selects DataOwner2>>
while slave database 2 has a very similar materialized view
CREATE MATERIALIZED VIEW a
<<refresh criteria>>
AS
SELECT a.*
FROM a#to_master a,
dataOwners#to_master dm
WHERE a.dataOwnerID = dm.dataOwnerID
AND dm.some_column = <<some criteria that selects DataOwner1 & 3>>
Of course, if the dataOwnerID can be hard-coded, you could simplify things and avoid doing the join. I'm guessing, though, that there is some column in the DataOwners table that identifies which slave a particular owner should be replicated to.
Assuming that you want only incremental changes to be replicated, you'd need to create some materialized view logs on the base tables in the master database. And you would probably want to configure refresh groups on the slave databases so that all the materialized views would refresh at the same time and would be transactionally consistent with each other.
Oracle Golden Gate software can do all these tasks. Insert/Update/Delete have the same order of the master db, so it can avoid the foreign keys and other constraint issues.
MasterDatabase Extract generates a trail file, then split out the data to DB 1,2,3 and 4.
It also can do multiple ways replications, i.e. DB 1 sends data back to the Master DB.
Besides the Golden Gate, trigger may be your other option. But it requires some programming.

Resources