Using Sqoop, is it possible to perform hive import from an Oracle View instead of table? - sqoop

Assume I have an Oracle view created by joining few other tables, it is possible to import the view data into Hive.
Thanks in advance

Yes it is possible as it is an object for sqoop and if we can fire a query, we can fetch data via sqoop. This can be also be checked from sqoop eval command and you can try to run to see if view is accessible or not.
sqoop eval (generic-args) (eval-args)

Related

How to generate Sqoop command from Sqoop options?

I want to generate Sqoop commands to import RDBMS tables from Mysql. I have the sqoop commands stored DB and later the Oozie workflows are created with this command. I write the Sqoop commands manually now.
Is there a way i could create an object of SqoopOptions , set the values and generate sqoop queries out of the object ?
Yes you could. You'll want to call/extend the classes here.

How to transfer data & metadata from Hive to RDBMS

There are more than 300 tables in my hive environment.
I want to export all the tables from Hive to Oracle/MySql including metadata.
My Oracle database doesn't have any tables corresponding to these Hive tables.
Sqoop import from Oracle to Hive creates tables in Hive if the table doesn't exists.But Sqoop export from Hive to Oracle doesn't create table if not exists and fails with an exception.
Is there any option in Sqoop to export metadata also? or
Is there any other Hadoop tool through which I can achieve this?
Thanks in advance
The feature you're asking for isn't in Spark. I don't know of a current hadoop tool which can do what you're asking either unfortunately. A potential workaround is using the "show create table mytable" statement in Hive. It will return the create table statements. You can parse this manually or pragmatically via awk and get the create tables in a file, then run this file against your oracle db. From there, you can use sqoop to populate the tables.
It won't be fun.
Sqoop can't copy metadata or create table in RDBMS on the basis of Hive table.
Table must be there in RDBMS to perform sqoop export.
Why is it so?
Mapping from RDBMS to Hive is easy because hive have only few datatypes(10-15). Mapping from multiple RDBMS datatypes to Hive datatype is easily achievable. But vice versa is not that easy. Typical RDBMS has 100s of datatypes (that too different in different RDBMS).
Also sqoop export is newly added feature. This feature may come in future.

How to create external table in Hive using sqoop. Need suggestions

Using sqoop I can create managed table but not the externale table.
Please let me know what are the best practices to unload data from data warehouse and load them in Hive external table.
1.The tables in warehouse are partitioned. Some are date wise partitioned some are state wise partitioned.
Please put your thoughts or practices used in production environment.
Sqoop does not support creating Hive external tables. Instead you might:
Use the Sqoop codegen command to generate the SQL for creating the Hive internal table that matches your remote RDBMS table (see http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal)
Modify the generated SQL to create a Hive external table
Execute the modified SQL in Hive
Run your Sqoop import command, loading into the pre-created Hive external table
Step 1: import data from mysql to hive table.
sqoop import
--connect jdbc:mysql://localhost/
--username training --password training
--table --hive-import --hive-table -m 1
--fields-terminated-by ','
Step 2: In hive change the table type from Managed to External.
Alter table <Table-name> SET TBLPROPERTIES('EXTERNAL'='TRUE')
Note:you can import directly into hive table or else to back end of hive.
My best suggestion is to SQOOP your data to HDFS and create EXTERNAL for Raw operations and Transformations.
Finally mashed up data to the internal table. I believe this is one of the best practices to get things done in a proper way.
Hope this helps!!!
Refer to these links:
https://mapr.com/blog/what-kind-hive-table-best-your-data/
In the above if you want to skip directly to the point -->2.2.1 External or Internal
https://hadoopsters.net/2016/07/15/hive-tables-internal-and-external-explained/
After referring to the 1st link then second will clarify most of your questions.
Cheers!!

Dynamic Sqoop Query and scheduling

My Sqoop query is stored in a DB. I need to fetch that query and execute it.
Intern that query will archieve the oracle tables and store in the HDFS.
Can anyone please let me know how can I retrieve the row from the DB(containing sqoop query) and execute it.
I am new to Hadoop/Sqoop/Hive.
You need programming language support or you can use shell script.

Sqooping same table for different schema in parallal is failing

We are having different data base schemas in Oracle. We are planning to sqoop some of the tables from oracle to Hive ware house. But If we put sqooping of tables of an oltp is sequential it is working. But to have a better usage we are planning to sqoop different oltps tables parallay, but it is faling to sqoop same table parallay.
It seems while sqooping a Table, one temporary table will be created in hdfs by sqoop and from there it will move the data to hive table, because of that reason we are not able to sqoop parallay.
Is there any way that we sqoop same tables parallay.
You can use parameter --target-dir to specify arbitrary temporary directory on HDFS where Sqoop will import data first. This parameter should be working in conjunction with --hive-import.

Resources