Sqoop Import from Hive to Hive - hadoop

Can we import tables from Hive DataSource to Hive DataSource using Sqoop.
Query like -
sqoop import --connect jdbc:hive2://localhost:10000/default --driver org.apache.hive.jdbc.HiveDriver --username root --password root --table student1 -m 1 --target-dir hdfs://localhost:9000/user/dummy/hive2result
Right now its throwing the below exception
15/07/19 19:50:18 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Method not supported
java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:141)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:290)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Sqoop is not a tool for transferring data from one hive instance to another hive instance. Seems like your requirement is to transfer data in hive from one cluster to another cluster. This can be achieved using hadoop distcp. The full form of sqoop itself is SQl to hadOOP and viceversa.
If you want to migrate multiple databases and tables from one hive to another hive instance, the best approach is to transfer the data using hadoop distcp and trigger the DDLs in the 2nd hive instance. If you don't have the DDLs handy with you, no need to worry.
Just take a dump of the metastore database.
Open the dump file using a notepad or textpad
Replace the hdfs uri with the new hdfs uri.
Import the mysql dump to the metastore of the 2nd hive instance.
Refresh the tables.
An example is given in the below blog post
https://amalgjose.wordpress.com/2013/10/11/migrating-hive-from-one-hadoop-cluster-to-another-cluster-2/

distcp will work only for external tables. For managed tables (transactional) use export import DDL.

Related

How can we import all tables in RDBMS into a Hive custom database?

I want to "import-all-tables" using sqoop from mysql to a Hive Custom Database ( Not Hive default Database )
Steps tried:
Create a custom database in hive under "/user/hive/warehouse/Custom.db"
Assigned all permissions for this directory- so there will be NO issues in writing into this directory by sqoop.
Used below command with option "--hive-database" option on CDH5.7 VM :
sqoop import-all-tables
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username retail_dba
--password cloudera
--hive-database "/user/hive/warehouse/sqoop_import_retail.db"
Tables created in hive default database only, not in the custom DB in this case: "sqoop_import_retail.db"
Else its trying to creates tables in the previous HDFS directories (/user/cloudera/categories), and error out stating table already exists:
16/08/30 00:07:14 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
16/08/30 00:07:14 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
[cloudera#quickstart etc]$
How to address this issues?
1. Creating tables in hive custom DB
2. Flushing previous directory references with Sqoop.
You did not mention --hive-import in your command. So, it will import it to HDFS under /user/cloudera/ in your case.
You are exceuting query again. That's why getting Exception
Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
Modify import command:
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera --hive-database custom --hive-import
It will fetch all the tables from retail_db of MySQL and create corresponding table to custom database in hive.

Hive External Table

I am trying to import data from Oracle to Hive using sqoop.
I used the command below once, now I want to overwrite the existing data with new data (Daily Action).
I ran this command again.
sqoop import --connect jdbc:oracle:thin:#UK01WRS6014:2184:WWSYOIT1
--username HIVE --password hive --table OIDS.ALLOCATION_SESSION_DIMN
--hive-overwrite --hive-database OI_DB --hive-table ALLOCATION_SESSION_DIMN
But I am getting an error File already exists:
14/10/14 07:43:59 ERROR security.UserGroupInformation:
PriviledgedActionException as:axchat
(auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException:
Output directory
hdfs://uslibeiadg004.aceina.com:8020/user/axchat/OIDS.ALLOCATION_SESSION_DIMN
already exists
The tables that I created in hive were all external tables.
Like mapreduce, do we have to delete that file everytime we execute the same command ?
Any help would be highly appreciated.
When you delete from an EXTERNAL table you only delete objects in the Hive metastore: you don't delete the files over which that table is superimposed. A non-external table belongs soley to Hive and, when deleted, will result in metastore- AND HDFS-data being removed.
So you can either try deleting the HDFS data explicitly, or define the table as being internal to hive.

Issue with sqoop import from mysql to hbase

I am trying to import data from mysql to hbase using sqoop:
sqoop import --connect jdbc:mysql://<hostname>:3306/test --username USERNAME -P --table testtable --direct --hbase-table testtable --column-family info --hbase-row-key id --hbase-create-table
The process runs smoothly, without any error, but the data goes to hdfs and not to hbase.
Here is my setup:
HBase and Hadoop is installed in distributed mode in my three server cluster. Namenode and HBase Master being one one server. Datanodes and Region server lies in two other servers. Sqoop is installed in NameNode server only.
I am using Hadoop version 0.20.2-cdh3u3, hbase version 0.90.6-cdh3u4 and sqoop version 1.3.0-cdh3u3.
Any suggestions where I am doing wrong?
Sqoop's direct connectors usually do not support HBase and this is definitely the case for MySQL direct connector. You should drop the --direct option if you need import data into HBase.
Here is an example of importing data from Mysql to HBase
http://souravgulati.webs.com/apps/forums/topics/show/8680714-sqoop-import-data-from-mysql-to-hbase

How to specify Hive database name in command line while importing data from RDBMS into Hive using Sqoop ?

I need to import data from RDBMS table into remote Hive machine. How can i achieve this using Sqoop ?
In nut shell, How to specify hive database name and the hive machine i/p in the import command?
Please help me with appropriate sqoop command.
You should run the sqoop command on the machine where you have Hive installed, because sqoop will look for $HIVE_HOME/bin/hive to execute the CREATE TABLE ... and other statements.
Alternatively, you could use sqoop with the --hive-home command line option to specify where your Hive is installed (just overrides $HIVE_HOME)
To connect to your remote RDBMS:
sqoop import --connect jdbc:mysql://remote-server/mytable --username xxx --password yyy
To import into Hive:
sqoop import --hive-import
You can get a more comprehensive list of commands by looking at http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_literal_sqoop_import_literal">this link.

I couldn't import the tables from my sql server to hive through sqoop

When I pass the command:
$sqoop create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433;username=cloud;password=cloud123;database=hadoop' --table cluster
Some errors and warnings appear and at the end it says,
Failed to start database '/var/lib/hive/metastore/metastore_db', see the next exception for details [again a list of import errors displayed]
Finally it says hive exited with satus 9
What is the problem here? I am new to sqoop and hive. Please anyone help me.
The correct syntax would be
sqoop import --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster --hive-import
I think you might want to check if you have write permissions to the specified directory and if a directory named metastore_db is being created
This message is usually shown when you're running Sqoop with default Hive configuration. Hive will by default use derby datastore which is usable only in very basic test use cases. I would recommend to reconfigure your hive instance to use some other relation database as a datastore back end (MySQL, PostgreSQL, Oracle).
Your syntax is all wrong. Syntax is $sqoop tool-name [tool-arguments]
$sqoop import --create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster
Pasting a sample call of hive import using sqoop. This might help you to correct your syntax further. Remember that essentially you need to give minimum the below command to make it work.
sqoop import --connect jdbc:mysql://localhost/RAWDATA --table geolocation --username root --password hadoop --hive-import --create-hive-table --driver com.mysql.jdbc.Driver --m 1 --delete-target-dir
--connect, in this the part which reads /RAWDATA is the database name from your mysql instance which contains the geolocation table. You can execute 'show databases' and 'show tables' command in mysql to check for your databases and tables.
--delete-target-dir option is used for safety. It will ensure sqoop delete the tmp dir it creates to write the file before moving it into hive. This will avoid unnecessary errors of directory already exists, in case you retry the command.
--create-hive-table is required only if you did not create the target table in hive already. If your previous runs of sqoop command created the table already, then you can ignore this option completely. Check your hive database for existence of target hive table.
--driver is a mandatory part of the command to perform any database connection.Make sure you either find the right path to the driver library or try googling for options. You can try first the one pasted above to see if it does the trick. You can revert to this forum for help.
remember we did not mention which database in hive the table will be created therefore it will be in default database of hive. I am not giving that option since you are just about starting in sqoop.

Resources