I am trying to create a proof of concept showing that it is possible for a company to migrate their data from their oracle 12c DB to a hadoop system..
To do this I have started a Oracle Linux 7 instance on AWS. I am planning to install Oracle 12c on it.. after that I have to create dummy tables and send them to Hadoop..
can this be done? From my initial research I can see that Sqoop and Oracle GoldenGate can do that.. also if this goes live i would have to transfer billions of records from oracle 12c to HDFS..
Any help or advice is much much appreciated..
You must use sqoop for transform your oracle data to hive. You can read this for more info
Related
I have two virtual servers and I installed Oracle 19c on only one server, and I need to install another Oracle database on the second server and I need to make clustering in the database between the two servers. How to do this? Is this available using Windows Cluster?
You can not use windows cluster to deploy oracle RAC. You should oracle's own software (Oracle Cluster ware) to do it.
2.To deploy oracle RAC:
a.If you installed database as a single instance,at first you should convert it to RAC and then through oracle addnode procedure, add the second node to cluster.
b.If your installation is a RAC you should do the prerequisite on the second node and using oracle addnode script, add the second node. in recent versions of oracle addnode has a graphical interface also.
I am pretty new to Oracle Golden Gate, wanted to understand if it possible to create a bidirectional sync between Oracle 12x and Cassandra(DSE) using Oracle Golden Gate? Searched several places in internet but most examples are replicating data between Oracle databases. I started wondering if it is even possible to do so. Can anyone help me with any documentation?
There is a separate module called Oracle GoldenGate for BigData. It supports many NoSQL replication targets.
One of the supported BigData databases is also Apache Cassandra.
There is a separate manual explaining how to use it.
There is no separate module that allows you to connect Apache Cassandra as the source of your replication. If you need such replication you need to provide some intermediate step. The source of replication for Oracle GoldenGate can only be a database (Oracle, TimesTen, DB2, Informix, MySQL, MS SQL Server, NonStop SQL/MX, SAP/Sybase ASE, Teradata) or a JMS queue.
We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does Informatica connect to Cloudera Impala? If so, do we need to have any additional connector for that? I have done comprehensive research to check if this is supported but could not find anything. Did anyone already try this? If so, can you specify the steps and link to any documentation?
Informatica version: 9.6.1 (Hotfix 2)
You can use the odbc driver provided by cloudera.
http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html
For Irene, the you can use the same driver the above one is based the simba driver.
http://www.simba.com/drivers/hbase-odbc-jdbc/
I have use case where I need to Import/Sqoop Microsoft Access data into hdfs. Is there any drivers available for MS Access to Sqoop data. Has any one came across such a case .
Please drop your comments and observations .
Looks like there is no support with access. Here is the list of supported databases from sqoop. The nearest is Microsoft SQL server here. Main requirement is the jdbc driver to connect to database.
I have been assigned to move 10g database to AWS Oracle 12c. Size of data is around 20 GB of actual data but 900 GB has been allocated. So we will need to decrease that. Any suggestions on how to migrate and reduce the size of the database?
This is a good summary of several options:
Importing Data Into Oracle on Amazon RDS
Also here is the Oracle migration Whitepaper.
The solution depends on the size of your data and the accepted downtime.
Personally I used Data Dump in a project, it works well for the size of data you have. I'm not sure about the resizing.