How to import tables from sql server through sqoop to hdfs - hadoop

I have hadoop, hive, sqoop installed. I imported the table from my database to hdfs but couldnt import it to hive. Do I need to configure any file in hive? Also when I browsed the web the configuration is shown for MySQL but I am using the driver jdbc:sqlserver.
Anyone please help me as I am stuck with this since many days.

jdbc:mysql is for mysql and it won't work for sqlserver, i have tried using it and it was giving out errors. I have tried the below command and it worked wonderfully.
Command – import
Copy data from Database Table to HDFS File System
In the example below, our database & hdfs configuration is:
server name :- labDB
database name :- demo
SQL user name :- sqoop
SQL password :- simp1e
Driver Class Name :- com.microsoft.sqlserver.jdbc.SQLServerDriver
Table :- dbo.customers
Target Directory : /tmp/dbo-customer (HDFS Folder name)
Syntax:
sqoop import --connect jdbc:sqlserver://sqlserver-name \
--username <username> \
--password <password> \
--driver <driver-manager-class> \
--table <table-name> \
--target-dir <target-folder-name>
Sample:
sqoop import --connect "jdbc:sqlserver://labDB;database=demo" \
--username sqoop \
--password simp1e \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--table "dbo.customer" \
--target-dir "/tmp/dbo-customer"
https://danieladeniji.wordpress.com/2013/05/06/hadoop-sqoop-importing-data-from-microsoft-sql-server/

You should be able to import a table and see it in Hive using the --hive-import flag
Check if you have defined all the global variables, HADOOP_HOME, SQOOP_HOME and HIVE_HOME
If it doesn't work for you, in the meantime you can always use CREATE EXTERNAL TABLE syntax to make use of your imported data in Hive.

Have you used the specific --hive-import switch in the sqoop command line?
Sqoop import --connect ‘jdbc:sqlserver://sqlservername.mycompany.com;username=hadoop;password=hadoop;database=hadoop’ --table dataforhive --hive-import

just create an external hive table on the path in hdfs. or use --hive-import
Any of the two should work for you.

I also had the same problem, I could store my MYSQL table in the HDFS but couldn't store it in hive. I simple imported the table in hive using the following command without again storing it in the HDFS and it worked for me.
sqoop import --connect jdbc:mysql://ipAddress:portNo/mysqldatabase --table mysqltablename --username mysqlusername --password mysqlpassword --hive-import --hive-table hivedatabase.hivetablename

Related

unable to list the oracle table names with sqoop

I am trying to connect to oracle db and list the names of the tables with sqoop like this:
sqoop list-tables --connect jdbc:oracle:thin:#<db server>:1521:DB_Name--
username hdp --password hadoop
I dont get any errors back. There are bunch of tables on the database server but cannot get it listed with sqoop. Any ideas what I am missing? I temporarily gave dba rights to the hdp user, still cannot get the list of tables. Any ideas?
You should add space before double dash
sqoop list-tables --connect jdbc:oracle:thin:#<db server>:1521:DB_Name --username hdp --password hadoop
And from what I saw in to the documentation the format should be something like:
sqoop --connect jdbc:oracle//<db server>:1521/DB_Name --username hdp --password hadoop --list-tables
If you only need the list of the tables in oracle why do not use sqlplus?

Oraoop disabled for Sqoop import

I'm using the Hortonworks HDP Sandbox, and I’ve installed Oraoop per the instructions, but whenever I run a Sqoop import I get the message “oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.”. I’m not sure what else I need to do for it to pick it up. I have verified that the oraoop driver is in my sqoop lib directory. The imports do work, but they are just using the oracle driver, and I would like to play around with some of the features that you get with Oraoop.
This is the command I'm running:
sqoop-import --connect jdbc:oracle:thin:#<ip>:1521/sid --username myUser -P --query "select * from mytable where \$CONDITIONS" -split-by sequence_id -as-sequencefile --target-dir /user/hue/data/deactivatedsponsor
If '--query' argument is specified in place of '--table' parm, Oraoop connector is not used.
Following is mentioned in Sqoop Documentation
Data Connector for Oracle and Hadoop accepts responsibility for those Sqoop Jobs with the following attributes:
Oracle-related
Table-Based - Jobs where the table argument is used and the specified object is a table.
Following command should use Oraoop Connector. I have included "--direct" option as well which indicates to Sqoop that Oraoop should be used.
sqoop-import --connect jdbc:oracle:thin:#<ip>:1521/sid --direct --username myUser -P --table mytable -split-by sequence_id -as-sequencefile --target-dir /user/hue/data/deactivatedsponsor --columns <columns list> --where <where condition if needed>
Oraoop connector cannot process --query tool, when you use --query it automatically invokes sqoop.
So instead of using --query use --table for import.
Hope this helps!!

sqoop import is showing error

I'm using hadoop 2.5.1 and sqoop 1.4.6.
I am using sqoop import for importing table from mysql database to be used with hadoop. It is showing following error
Sqoop Command
sqoop import --connect jdbc:mysql://localhost/<dbname> --username hadoopsqoop --password hadoop#123 --table tablename -m 1
Exception
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.fs.FSOutputSummer
Is there any way to figure out issue?
I figured out issue. I set HADOOP_HOME correctly and it solves my problem.
How can you import with out mentioning where to store the file. try this
sqoop import --connect jdbc:mysql://localhost/dbname --username hadoopsqoop --password hadoop#123 --table tablename --target-dir 'hdfspath' -m 1

sqoop import issue with mysql

I have a hadoop ha setup based on cdh5.I have tried to import tables from mysql by using sqoop failed with following error.
15/03/20 12:47:53 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#33573e93 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#33573e93 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
I have used the below command..
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/haddata --username root --password password --table authors --hive-import
My mysql server version is 5.1.73-3. and used 5.1.34 and 5.1.17 version of mysql-connector-java
sqoop version is 1.4.5-cdh5.3.2
Please let me know any suggestion/comments.
Try including the option --driver com.mysql.jdbc.Driver in the import command.
Try using the below modified command, which can suit your purpose
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/haddata --driver com.mysql.jdbc.Driver --username root --password password --table authors --hive-import
follow this link
Include the driver argument --driver com.mysql.jdbc.Driver in sqoop command.
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/<db name> --username **** --password **** --table <table name> --hive-import --driver com.mysql.jdbc.Driver
The --driver parameter forces sqoop to use the latest mysql-connector-java.jar installed for mysql db on the sqoop machine
Try with mysql-connector-java-5.1.31.jar, it is compatable with sqoop 1.4.5.
mysql-connector-java-5.1.17.jar driver does not work with sqoop 1.4.5.
refer :
https://issues.apache.org/jira/browse/SQOOP-1400
If you have com.mysql.jdbc_5.1.5.jar or any version of com.mysql.jdbc_5.X.X.jar file in $HADOOP_HOME/bin folder, then remove that, and execute your SQOOP query.
including the option --driver com.mysql.jdbc.Driver in the import command worked for me.
Sqoop does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop/ directory on the server.
Note:
The JDBC drivers need to be installed only on the machine where Sqoop runs. You do not need to install them on all hosts in your Hadoop cluster.
You can download driver from here : https://dev.mysql.com/downloads/connector/j/5.1.html
Try the exact command as like below.
sqoop import --connect "jdbc:mysql://localhost:3306/books"
--username=root --password=root --table authors --as-textfile --target-dir=/datasqoop/authors_db --columns "id, name, email" --split-by id --driver com.mysql.jdbc.Driver
This will resolve your issues.
Find the jar locations that are being used in sqoop, in my case, it is pointing to the link /usr/share/java/mysql-connector-java.jar
so when I check the link /usr/share/java/mysql-connector-java.jar it points to mysql-connector-java-5.1.17.jar
/usr/share/java/mysql-connector-java.jar -> mysql-connector-java-5.1.17.jar
as 5.1.17 is having this issue, try 5.1.37 or higher.
unlink /usr/share/java/mysql-connector-java.jar
ln -s /usr/share/java/mysql-connector-java.jar /usr/share/java/mysql-connector-java-5.1.37.jar

Moving Sqoop data from HDFS to Hive

When importing a bunch of large MySQL tables into HDFS using Sqoop, I forgot to include the --hive-import flag. So now I've got these tables sitting in HDFS, and am wondering if there's an easy way to load the data into Hive (without writing the LOAD DATA statements myself).
I tried using sqoop create-hive-table:
./bin/sqoop create-hive-table --connect jdbc:mysql://xxx:3306/dw --username xxx --password xxx --hive-import --table tweets
While this did create the correct hive table, it didn't import any data into it. I have a feeling I'm missing something simple here...
For the record, I am using Elastic MapReduce, with Sqoop 1.4.1.
Can't you create an external table in hive and point it to these files?
create external table something(a string, b string) location 'hdfs:///some/path'
You did not specify "import" in your command. Syntax is sqoop tool-name [tool-arguments]
It should look like this:
$ sqoop import --create-hive-table --connect jdbc:mysql://xxx:3306/dw --username xxx --password xxx --hive-import --table tweets

Resources