I am trying to import all tables in a specific schema from Oracle database with Sqoop command :
sqoop import-all-tables --connect jdbc:oracle:thin:server:port:database --username x --password y --warehouse-dir warehouse-dir --hive-import --create-hive-table
But this Oracle database has more schemas and I need only to import all tables from one specific schema.
You should use the additional parameter --exclude-tables <tables>, which refers to a comma separated list of tables to exclude from import process.
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html
If you use --import-all-tables the tables to be imported will be those that the user who connects by jdbc has select privilege over. Basically, those tables that the user see in all_tables.
To limit the list, another option is to use a different user, one with limited select over the tables you want to export to.
Related
I have a table in oracle (table name is TRCUS) with customer's details, partitioned based on year & month.
Partitions name in Oracle:
PERIOD_JAN_13,
PERIOD_FEB_13,
PERIOD_JAN_14,
PERIOD_FEB_14 etc
Now I want to import this table's data into HIVE using SQOOP directly.
Sqoop job should create a hive table, dynamically create partitions based on the oracle table partition and then import data into hive; into the respective partitions.
How can this be achievable using SQOOP ?
Unfortunately, it cannot be achieved using Sqoop. However, there is one method which I guess you might not know.
Create a table in Hive without any partitions.
Set dynamic partition modes
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
Import data into Hive table that is not partitioned using Sqoop
sqoop import --connect "jdbc:mysql://quickstart.cloudera:3306/database1" --username root --password cloudera --query 'SELECT DISTINCT id, count from test WHERE $CONDITIONS' --target-dir /user/hive/warehouse/ --hive-table pd_withoutpartition --hive-database database1 --hive-import --hive-overwrite -m 1 --direct
Create another table with partitions
Overwrite into partition table from previous table
INSERT OVERWRITE TABLE pd_partition partition(name) SELECT id, count, name from pd_withoutpartition;
Note: Make sure that column with which you want to partition is mentioned last during overwrite in select statement.
Hive Version : Hive 1.1.0-cdh5.13.1
We are using netezza direct to import data from Netezza to Hadoop as part of POC.
Have couple of questions on Netezza specific and Netezza Sqoop Integration.
Q1. Does Sqoop direct mode always require CREATE EXTERNAL TABLE and DROP privilege to perform direct transfer?
Q2. Does external table get created in Netezza ? If yes, which database ? I see Sqoop using below query :
CREATE EXTERNAL TABLE '/yarn/local/usercache/someuser/appcache/application_1483624176418_42787/work/task_1483624176418_42787_m_000000/nzexttable-0.txt'
USING (REMOTESOURCE 'JDBC'
BOOLSTYLE 'T_F'
CRINSTRING FALSE DELIMITER 44 ENCODING
'internal' FORMAT 'Text' INCLUDEZEROSECONDS TRUE
NULLVALUE 'null' MAXERRORS 1)
AS SELECT * FROM SOME_TBL WHERE (DATASLICEID % 3)
Does it create in Database selected in db URL ? jdbc:netezza://somehostname:5480/SOME_DB_1
Q3. If Netezza needs to create External tables, can it create the external table in different database than the one which the actual table with data that needs to be pulled into Hadoop. What is the config change that needs to be done ?
Q4. Does Sqoop run DROP table on external table which was created by individual mappers ?
Sqoop command Used :
export HADOOP_CLASSPATH=/opt/nz/lib/nzjdbc3.jar
sqoop import -D mapreduce.job.queuename=some_queue
-D yarn.nodemanager.local-dirs=/tmp -D mapreduce.map.log.level=DEBUG
--direct --connect jdbc:netezza://somehost:5480/SOME_DB --table SOME_TBL_1
--username SOMEUSER --password xxxxxxx --target-dir /tmp/netezza/some_tbl_file
--num-mappers 2 --verbose
This is what I got as reply in Sqoop User community (Thanks Szabolcs Vasas).
In case of Netezza direct imports Sqoop executes a CREATE EXTERNAL TABLE command (so you will need CREATE EXTERNAL TABLE privilege) to create a backup of the content of the table to a temporary file and it copies the content of this file to the final output on HDFS.
The SQL command you pasted in your email is indeed the one which is executed by Sqoop but as far as I understand from the Netezza documentation (http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_expls.html, 6th example) this does not really create a new external table in any schema it just backs up the content of the table and because of that no DROP TABLE statement is executed.
Q1. Yes, Sqoop need CREATE EXTERNAL TABLE but not DROP privilege.
Q2. Sqoop does not really create a new external table in any schema it just backs up the content of the table (http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_expls.html, 6th example).
Q3. Not possible to create an EXTERNAL table in a specific schema.
Q4. No, Sqoop does not run DROP command.
Moreover, the table created by sqoop direct process is Netezza TET - Transient external tables. Thus, the external remotesource JDBC table is dropped once the mapper receives the data as NamedFifo. Thus tables are not stored in Netezza after the transfer.
how to do a multi table or selective table import in sqoop from MYSQL
I can see either we can import table by table or import-all-tables.
am looking for a way to import only certain tables from MYSQL.
Any help would be appreciated ?
Thanks
There is no way to put multiple tables in sqoop import command.
But you can use import-all-tables with --exclude-tables <tables> to exclude some tables while importing all the tables of a database.
For example
You have 5 tables (tbl1, tbl2, tbl3, tbl4, tbl5) in testdb database and you want to exclude tbl2 and tbl4.
Use the following command:
sqoop import-all-tables --connect jdbc:mysql://localhost:3306/testdb --exclude-tables tbl2,tbl4 .....
Using sqoop I can create managed table but not the externale table.
Please let me know what are the best practices to unload data from data warehouse and load them in Hive external table.
1.The tables in warehouse are partitioned. Some are date wise partitioned some are state wise partitioned.
Please put your thoughts or practices used in production environment.
Sqoop does not support creating Hive external tables. Instead you might:
Use the Sqoop codegen command to generate the SQL for creating the Hive internal table that matches your remote RDBMS table (see http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal)
Modify the generated SQL to create a Hive external table
Execute the modified SQL in Hive
Run your Sqoop import command, loading into the pre-created Hive external table
Step 1: import data from mysql to hive table.
sqoop import
--connect jdbc:mysql://localhost/
--username training --password training
--table --hive-import --hive-table -m 1
--fields-terminated-by ','
Step 2: In hive change the table type from Managed to External.
Alter table <Table-name> SET TBLPROPERTIES('EXTERNAL'='TRUE')
Note:you can import directly into hive table or else to back end of hive.
My best suggestion is to SQOOP your data to HDFS and create EXTERNAL for Raw operations and Transformations.
Finally mashed up data to the internal table. I believe this is one of the best practices to get things done in a proper way.
Hope this helps!!!
Refer to these links:
https://mapr.com/blog/what-kind-hive-table-best-your-data/
In the above if you want to skip directly to the point -->2.2.1 External or Internal
https://hadoopsters.net/2016/07/15/hive-tables-internal-and-external-explained/
After referring to the 1st link then second will clarify most of your questions.
Cheers!!
I can import SQL tables to Hive, however when I try to SQL view, I am getting errors.
Any ideas ?
From Sqoop documentation:
If your table has no index column, or has a multi-column key, then you must also manually choose a splitting column.
I think it's your case here. Provide additional option like:
--split-by tablex_id
where tablex_id is a field in your view that can be used as a primary key or index.
There is no any special command to import View form RDBMS.
Use simple import command with --split-by that sqoop refers to same as primary key.
you can import the rdbms views data but u have to pass -m 1 or --split-by in your command.