How to join three heterogeneous sources using single joiner - transformation

How to join three heterogeneous sources using single joiner sources? Maybe three flat files, references three different relational databases (Oracle, Teradata, SQL server) tables or one flat file, one oracle table, and one SQL server table.
We need to use only one joiner only, how can we implement this?

If you have 3 flat files then it's not possible to join 3 flat files sources with one joiner. As mentioned in Informatica Joiner Transformation documentation. The joiner transformation joins 2 heterogeneous sources.
If you have 2 tables and 1 flat file, then you could use SQL override to join the 2 tables, then use single joiner to join.

Related

How to compare data between two tables where one table is in oracle other is in postgres

I have two tables with the same table structure. One in oracle and the other in Postgres. I would like to compare the data between the two tables. I cannot use DB_Link, because of some connectivity issues.
I have copied both the contents to an excel sheet. But still having issues comparing the data.
Please suggest a suitable option to compare the data between the two tables.

Is it possible to make a join between tables that are in different databases?

I have two databases, one is oracle and the other one is postgres, and I need to perform a join select between tables in those databases. Is there any way to make this possible?
That is simple.
Install oracle_fdw in the PostgreSQL database and define a foreign table for the Oracle table.
Then you can perform the join as if it were two PostgreSQL tables.
Be careful with big or complicated queries though: of course, the performance will be worse than for a join of two local tables.

spark dataframe and design Hbase : one table multiple-columns vs multiple tables one column family

i have multiples tables on an oracle database. i would like to copy this tables on Hbase, what is the best design, one table with multiple-columns family and each column family represent an oracle table? or multiple tables on Hbase with one column family containing all fields or multiples tables withe multiple columns family (each column family contain one column qualifier)?
i would after that using spark dataframe to run some job and querying like Oracle!!
wich strategy you use?
cordially
Multiple column family (more than 3 column family) for one table is discouraged.
Please see hbase manual
So you have other option[s] which you have mentioned which are more suited for your requirements and your kind of design.

Where to do a join to flatten table..? Hive or Oracle

I have 7 normalized tables in oracle that I need to flatten out(some columns, not all) to work with map-reduce jobs. Now I have 2 choices- one is to do a join in oracle and use sqoop to import the joined table in hdfs. Or to import the tables one by one and then do join using hive itself.
Is there any difference between the two approaches, pro's or cons?
Thank you.
I am well comfortable with both oracle and hive. In this case seems reasonable to get the join done in oracle. You can ensure that all the moving parts are in sync and available.
You may also consider to create an oracle view that embodies the join. You can then more repeatably verify and extract the contents of the various tables into your single denormalized one.

minus query in sql developer

I have two database connections in sql developer active lets say DB1 and DB2. I am working on ETL validation. So I want to check if data from Table1 of DB1 is populated correctly in Table2 of DB2.
To access tables from this two connections how can I write a query?
Any help on this will be helpful
There are common ways to validate if the ETL is correct:
You can run 2 queries to calculate the line counts separately against to Table1#DB1 and Table2#DB2, and compare the line counts between them.
Or perform some aggregate functions, such as sum(), avg()...etc on tables in DB1/DB2.

Resources