Issue with sqoop import from mysql to hbase - hadoop

I am trying to import data from mysql to hbase using sqoop:
sqoop import --connect jdbc:mysql://<hostname>:3306/test --username USERNAME -P --table testtable --direct --hbase-table testtable --column-family info --hbase-row-key id --hbase-create-table
The process runs smoothly, without any error, but the data goes to hdfs and not to hbase.
Here is my setup:
HBase and Hadoop is installed in distributed mode in my three server cluster. Namenode and HBase Master being one one server. Datanodes and Region server lies in two other servers. Sqoop is installed in NameNode server only.
I am using Hadoop version 0.20.2-cdh3u3, hbase version 0.90.6-cdh3u4 and sqoop version 1.3.0-cdh3u3.
Any suggestions where I am doing wrong?

Sqoop's direct connectors usually do not support HBase and this is definitely the case for MySQL direct connector. You should drop the --direct option if you need import data into HBase.

Here is an example of importing data from Mysql to HBase
http://souravgulati.webs.com/apps/forums/topics/show/8680714-sqoop-import-data-from-mysql-to-hbase

Related

unable to list the oracle table names with sqoop

I am trying to connect to oracle db and list the names of the tables with sqoop like this:
sqoop list-tables --connect jdbc:oracle:thin:#<db server>:1521:DB_Name--
username hdp --password hadoop
I dont get any errors back. There are bunch of tables on the database server but cannot get it listed with sqoop. Any ideas what I am missing? I temporarily gave dba rights to the hdp user, still cannot get the list of tables. Any ideas?
You should add space before double dash
sqoop list-tables --connect jdbc:oracle:thin:#<db server>:1521:DB_Name --username hdp --password hadoop
And from what I saw in to the documentation the format should be something like:
sqoop --connect jdbc:oracle//<db server>:1521/DB_Name --username hdp --password hadoop --list-tables
If you only need the list of the tables in oracle why do not use sqlplus?

Want to copy oracle data to hadoop

The tutorial in the link, mentioned below .
java-spark-tutorial
I have loaded data in oracle . Now i need to import it into hadoop . I am new to hadoop . I am familiar with Ambari .Can anyone please suggest how can we load data from oracle to hadoop using ambari tool ?
You can import rows from oracle to hadoop using sqoop. Typical command would be
sqoop import --connect jdbc:oracle:thin:<username>/<password>#<IP address>:1521:<db name> --username <username> -P --table <database name>.<table name> --columns "<column names>" --target-dir <target directory path in hdfs> -m 1

Sqoop Import from Hive to Hive

Can we import tables from Hive DataSource to Hive DataSource using Sqoop.
Query like -
sqoop import --connect jdbc:hive2://localhost:10000/default --driver org.apache.hive.jdbc.HiveDriver --username root --password root --table student1 -m 1 --target-dir hdfs://localhost:9000/user/dummy/hive2result
Right now its throwing the below exception
15/07/19 19:50:18 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Method not supported
java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:141)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:290)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Sqoop is not a tool for transferring data from one hive instance to another hive instance. Seems like your requirement is to transfer data in hive from one cluster to another cluster. This can be achieved using hadoop distcp. The full form of sqoop itself is SQl to hadOOP and viceversa.
If you want to migrate multiple databases and tables from one hive to another hive instance, the best approach is to transfer the data using hadoop distcp and trigger the DDLs in the 2nd hive instance. If you don't have the DDLs handy with you, no need to worry.
Just take a dump of the metastore database.
Open the dump file using a notepad or textpad
Replace the hdfs uri with the new hdfs uri.
Import the mysql dump to the metastore of the 2nd hive instance.
Refresh the tables.
An example is given in the below blog post
https://amalgjose.wordpress.com/2013/10/11/migrating-hive-from-one-hadoop-cluster-to-another-cluster-2/
distcp will work only for external tables. For managed tables (transactional) use export import DDL.

How to specify Hive database name in command line while importing data from RDBMS into Hive using Sqoop ?

I need to import data from RDBMS table into remote Hive machine. How can i achieve this using Sqoop ?
In nut shell, How to specify hive database name and the hive machine i/p in the import command?
Please help me with appropriate sqoop command.
You should run the sqoop command on the machine where you have Hive installed, because sqoop will look for $HIVE_HOME/bin/hive to execute the CREATE TABLE ... and other statements.
Alternatively, you could use sqoop with the --hive-home command line option to specify where your Hive is installed (just overrides $HIVE_HOME)
To connect to your remote RDBMS:
sqoop import --connect jdbc:mysql://remote-server/mytable --username xxx --password yyy
To import into Hive:
sqoop import --hive-import
You can get a more comprehensive list of commands by looking at http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_literal_sqoop_import_literal">this link.

Error in sqoop import query

Scenario:
I am trying for importing data from MS SQL Server to HDFS. But I am getting certain errors as:
Errors:
hadoop#ubuntu:~/sqoop-1.1.0$ bin/sqoop import --connect 'jdbc:sqlserver://localhost;username=abcd;password=12345;database=HadoopTest' --table PersonInfo
11/12/09 18:08:15 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.1
java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.1
at com.cloudera.sqoop.shims.ShimLoader.loadShim(ShimLoader.java:190)
at com.cloudera.sqoop.shims.ShimLoader.getHadoopShim(ShimLoader.java:109)
at com.cloudera.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:173)
at com.cloudera.sqoop.tool.ImportTool.init(ImportTool.java:81)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:411)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:170)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:196)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:205)
Question:
I have configured Sqoop successfully and then what could be the problem? I am trying to connect to database by entering IP address but there is also the same problem.
How can I remove these error? Pls suggest me solution.
Thanks.
Sqoop is now an incubator project in Apache. There is no reason Sqoop should only run with CDH and not Apache Hadoop.
The Sqoop documentation says Sqoop is compatible with Apache Hadoop 0.21 and Cloudera's Distribution of Hadoop version 3.. So, I think using the the correct version of Apache will also solve the problem.
SQOOP-82 is more than an year old and there had been changes after that.
FYI, Sqoop was made part of the Hadoop 0.21 branch and has been removed from Hadoop after moving it to Apache Incubator.
Please check this issue:
Sqoop does not run with Apache Hadoop 0.20.2. The only supported platform is CDH 3 beta 2. It requires features of MapReduce not available in the Apache 0.20.2 release of Hadoop. You should upgrade to CDH 3 beta 2 if you want to run Sqoop 1.0.0.
In your sqoop import command you are missing the driver value using --driver
May be this will help.
I think you should try this one, it may solve your problem:
Add the port number of the sqlserver. For port number check with your my.conf(/etc/mysql/my.conf) file.
Try this command with port number and schema:
sqoop import --connect jdbc:mysql://localhost:3306/mydb -username root -password password --table emp --m 1

Resources