hadoop sqoop load csv file into mysql - hadoop

I am learning hadoop sqoop. I am working on a hortonworks sandbox (a hadoop virtual machine of single node: http://hortonworks.com/products/hortonworks-sandbox/#install).
I am trying to load a csv file via sqoop into a mysql table. I have created a database flightinfo and a table weather in it. I created a table in hive called sqoop_tmp with the file location of that csv file.
I used following command to load the csv into mysql:
sqoop export --connect jdbc:mysql://localhost/flightinfo –-table weather –-export-dir /apps/hive/warehouse/sqoop_tmp
Here is the error message:
update: #z-1
I tried your code and it returns something different

You have to provide username and password of MySQL and must have permissions to access the database.
It looks like you did a fresh install of mysql and didn't not configure the root account to secure with root password.
You can do that using the following steps:
$ mysqladmin -u root password "newpassword"
for example $ mysqladmin -u root password mysql-root-password
Restart MySQL daemon. $ sudo service mysqld restart or $ sudo /etcinit.d/mysqld restart
Login to MySQL using $ mysql -u root -P and enter the password when prompted.
If you have your database created as root user. Now you can issue the following sqoop command to export data from hdfs to sqoop database.
$ sqoop export --connect jdbc:mysql://localhost/flightinfo --username root -P --table weather --export-dir /apps/hive/warehouse/sqoop_tmp
Still facing permission issue?? Permission to the user to access database is the problem.
Then, You should be able to solve by granting permissions using the steps below.
Login to MySQL using $ mysql -u root -P
mysql> GRANT all PRIVILEGES on filghtinfo.* to 'root'#'localhost' IDENTIFIED BY 'password'
Note: The password you set here should be used from sqoop.

try with username and password of mysql
sqoop export --connect jdbc:mysql://localhost/flightinfo --table weather --export-dir /apps/hive/warehouse/sqoop_tmp --username SomeUser -P
Note - user should have permission.

Try this:
sqoop export --connect jdbc:mysql://localhost/flightinfo --table weather --export-dir /apps/hive/warehouse/sqoop_tmp
You had –- not --. These two are not the same.

Related

HDP Sandbox SQOOP failed due to permission error

Below is the error message:
Unable to move source
hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/DimDepartmentGroup/part-m-00000
to destination
hdfs://sandbox-hdp.hortonworks.com:8020/warehouse/tablespace/managed/hive/dbodimemployee/delta_0000001_0000001_0000:
Permission denied: user=hive, access=WRITE,
inode="/user/maria_dev/DimDepartmentGroup":maria_dev:hdfs:drwxr-xr-x
I am totally confused. The error message itself shows that Maria_dev has write permission on the folder inode="/user/maria_dev/DimDepartmentGroup":maria_dev:hdfs:drwxr-xr-x
What did I miss?
When you run Sqoop , ** generally ** it first loads the Data from Your external database , then store that as a multi-part file at the given location (--target-dir /goldman/yahoo) then from that location to hive table (--hive-table topclient.mpool)
Now you can have access denied at 2 level .
1) If you see access denied at file location /goldman/yahoo, then set filelocation access to 777 running as hdfs user - sudo -u hdfs hadoop fs -chmod 777 /goldman/yahoo
2) If you see access denied while creating table , run sqoop command as user hive, beacuse the user hive has access to hive tables, i.e.
sudo -u hive sqoop import --connect 'jdbc:sqlserver://test.goldman-invest.data:1433;databaseName=Investment_Banking' --username user_***_cqe --password ****** --table cases --target-dir /goldman/yahoo --hive-import --create-hive-table --hive-table topclient.mpool
Finally, I got it to work. I logged in as root and switch to hive user using su - hive.
Then I was able to run the SQOOP command successfully. Previously I logged in as maria_dev and could not use su command. I do not have the password to user hive because hive is not a regular user in HDP sandbox.
Still, it is strange to me that a user needs to have root access to load some data into HDP HIVE.

unable to list the oracle table names with sqoop

I am trying to connect to oracle db and list the names of the tables with sqoop like this:
sqoop list-tables --connect jdbc:oracle:thin:#<db server>:1521:DB_Name--
username hdp --password hadoop
I dont get any errors back. There are bunch of tables on the database server but cannot get it listed with sqoop. Any ideas what I am missing? I temporarily gave dba rights to the hdp user, still cannot get the list of tables. Any ideas?
You should add space before double dash
sqoop list-tables --connect jdbc:oracle:thin:#<db server>:1521:DB_Name --username hdp --password hadoop
And from what I saw in to the documentation the format should be something like:
sqoop --connect jdbc:oracle//<db server>:1521/DB_Name --username hdp --password hadoop --list-tables
If you only need the list of the tables in oracle why do not use sqlplus?

Import data from Oracle(Windows) to HDFS (CDH3) machine using sqoop

Hi I am taking a training in HADOOP. I have a task in which I have to import a tables data from oracle(windows, 11g xe) to hdfs using sqoop. I am reading the following article. My question is that how do I exactly import data from windows to hdfs. Noramally I use Winscp to transfer files from Windows to hdfs machine. I have imported data from MySql which was installed in hdfs(cdh3) machine. But I don't know to import data from Oracle in windows to hdfs. Please help.
Link that I am following
Following is the step wise process:
1.Connect oracle sql command line log in with your credentials:
e.g username : system password: system
(make sure that this user has all administrative privileges or connect as sysdba in oracle make a new user with all privilegdes)
Create a user with all privileges in Oracle
Create tables under that user and insert some values and commit
2.Now we need a connector for transferring our data from Oracle to HDFS.
So, we need to download the oracle -sqoop connector jar file and place it in the following path of CDH3.(use sudo in your commands while copying in the following path as it requires admin acess in linux)
/usr/lib/sqoop/bin
http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html --Download link--ojdbc6.jar
Use winscp to transfer the downloaded jar from windows to CDH3.then move it to the above mentioned path in CDH3.
3.Command:
sudo bin/sqoop import –connect jdbc:oracle:thin:system/system#192.168.XX.XX:1521:xe–username system -P –table system.emp –columns “ID” –target-dir /sqoopoutput1 -m 1
sqoopoutput is the ouput file in HDFS where you will get your data ,U can change dis as per your
-m 1 : this tells the number no of mappers for this sqoop job here it is 1.
192.168.XX.XX:1521--ip address of your windows machine
you don't need to import data from oracle to local machine. Then copy it to HDFS machine. Then import it in HDFS.
Sqoop is here to import your RDBMS tables in HDFS directory.
Use command:
sqoop import --connect 'jdbc:oracle:thin:#192.xx.xx.xx:1521:ORCL' --username testuser --password testpassword --table testtable --target-dir /tmp/testdata
Go to the machine on which sqoop is running. Go to the terminal (I believe its linux). Just fire above mentioned command and check --target-dir (I mentioned /tmp/testdata in the example command) in hdfs. You will find files corresponding to your oracle table there.
Check sqoop docs for more details.

sqoop import issue with mysql

I have a hadoop ha setup based on cdh5.I have tried to import tables from mysql by using sqoop failed with following error.
15/03/20 12:47:53 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#33573e93 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#33573e93 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
I have used the below command..
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/haddata --username root --password password --table authors --hive-import
My mysql server version is 5.1.73-3. and used 5.1.34 and 5.1.17 version of mysql-connector-java
sqoop version is 1.4.5-cdh5.3.2
Please let me know any suggestion/comments.
Try including the option --driver com.mysql.jdbc.Driver in the import command.
Try using the below modified command, which can suit your purpose
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/haddata --driver com.mysql.jdbc.Driver --username root --password password --table authors --hive-import
follow this link
Include the driver argument --driver com.mysql.jdbc.Driver in sqoop command.
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/<db name> --username **** --password **** --table <table name> --hive-import --driver com.mysql.jdbc.Driver
The --driver parameter forces sqoop to use the latest mysql-connector-java.jar installed for mysql db on the sqoop machine
Try with mysql-connector-java-5.1.31.jar, it is compatable with sqoop 1.4.5.
mysql-connector-java-5.1.17.jar driver does not work with sqoop 1.4.5.
refer :
https://issues.apache.org/jira/browse/SQOOP-1400
If you have com.mysql.jdbc_5.1.5.jar or any version of com.mysql.jdbc_5.X.X.jar file in $HADOOP_HOME/bin folder, then remove that, and execute your SQOOP query.
including the option --driver com.mysql.jdbc.Driver in the import command worked for me.
Sqoop does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop/ directory on the server.
Note:
The JDBC drivers need to be installed only on the machine where Sqoop runs. You do not need to install them on all hosts in your Hadoop cluster.
You can download driver from here : https://dev.mysql.com/downloads/connector/j/5.1.html
Try the exact command as like below.
sqoop import --connect "jdbc:mysql://localhost:3306/books"
--username=root --password=root --table authors --as-textfile --target-dir=/datasqoop/authors_db --columns "id, name, email" --split-by id --driver com.mysql.jdbc.Driver
This will resolve your issues.
Find the jar locations that are being used in sqoop, in my case, it is pointing to the link /usr/share/java/mysql-connector-java.jar
so when I check the link /usr/share/java/mysql-connector-java.jar it points to mysql-connector-java-5.1.17.jar
/usr/share/java/mysql-connector-java.jar -> mysql-connector-java-5.1.17.jar
as 5.1.17 is having this issue, try 5.1.37 or higher.
unlink /usr/share/java/mysql-connector-java.jar
ln -s /usr/share/java/mysql-connector-java.jar /usr/share/java/mysql-connector-java-5.1.37.jar

I couldn't import the tables from my sql server to hive through sqoop

When I pass the command:
$sqoop create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433;username=cloud;password=cloud123;database=hadoop' --table cluster
Some errors and warnings appear and at the end it says,
Failed to start database '/var/lib/hive/metastore/metastore_db', see the next exception for details [again a list of import errors displayed]
Finally it says hive exited with satus 9
What is the problem here? I am new to sqoop and hive. Please anyone help me.
The correct syntax would be
sqoop import --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster --hive-import
I think you might want to check if you have write permissions to the specified directory and if a directory named metastore_db is being created
This message is usually shown when you're running Sqoop with default Hive configuration. Hive will by default use derby datastore which is usable only in very basic test use cases. I would recommend to reconfigure your hive instance to use some other relation database as a datastore back end (MySQL, PostgreSQL, Oracle).
Your syntax is all wrong. Syntax is $sqoop tool-name [tool-arguments]
$sqoop import --create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster
Pasting a sample call of hive import using sqoop. This might help you to correct your syntax further. Remember that essentially you need to give minimum the below command to make it work.
sqoop import --connect jdbc:mysql://localhost/RAWDATA --table geolocation --username root --password hadoop --hive-import --create-hive-table --driver com.mysql.jdbc.Driver --m 1 --delete-target-dir
--connect, in this the part which reads /RAWDATA is the database name from your mysql instance which contains the geolocation table. You can execute 'show databases' and 'show tables' command in mysql to check for your databases and tables.
--delete-target-dir option is used for safety. It will ensure sqoop delete the tmp dir it creates to write the file before moving it into hive. This will avoid unnecessary errors of directory already exists, in case you retry the command.
--create-hive-table is required only if you did not create the target table in hive already. If your previous runs of sqoop command created the table already, then you can ignore this option completely. Check your hive database for existence of target hive table.
--driver is a mandatory part of the command to perform any database connection.Make sure you either find the right path to the driver library or try googling for options. You can try first the one pasted above to see if it does the trick. You can revert to this forum for help.
remember we did not mention which database in hive the table will be created therefore it will be in default database of hive. I am not giving that option since you are just about starting in sqoop.

Resources