SQOOP connection-param-file format - hadoop

In Sqoop for Hadoop you can use a parameters file for connection string information.
--connection-param-file filename Optional properties file that provides connection parameters
What is the format of that file?
Say for example I have:
jdbc:oracle:thin:#//myhost:1521/mydb
How should that be in a parameters file?

if you want to give your database connection string and credentials then create a file with those details and use --options-file in your sqoop command
create a file database.props with the following details:
import
--connect
jdbc:mysql://localhost:5432/test_db
--username
root
--password
password
then your sqoop import command will look like:
sqoop --options-file database.props \
--table test_table \
--target-dir /user/test_data
and related to --connection-param-file hope this link will be helpful to understand its usage

It should be same as the command.
Example
import
--connect
jdbc:oracle:thin:#//myhost:1521/mydb
--username
foo

Below is the sample command connecting with mysql server
sqoop list-databases --connect jdbc:mysql://192.168.256.156/test --username root --password root
It will give you the list of databases available at your mysql server

Related

I wanted to know why the tables from hive db which I imported from sqlserver using sqoop is disappearing

So I'm trying to import-all-tables into hive db, ie, user/hive/warehouse/... on hdfs, using the below command:
sqoop import-all-tables --connect "jdbc:sqlserver://<servername>;database=<dbname>" \
--username "<username>" \
--password "<password>" \
--warehouse-dir "/user/hive/warehouse/" \
--hive-import \
-m 1
In the testdatabase I have 3 tables, when mapreduce runs, the output is success,
ie, the mapreduce job is 100% complete but the file is not found on hive db.
It’s basically getting overwritten by the last table, try removing the forward slash at the end of the directory path. For the tests I would suggest not to use the warehouse directory, use something like ‘/tmp/sqoop/allTables’
There is a another way
1. Create a hive database pointing to a location says "targetLocation"
2. Create hcatalog table in your sqoop import using previously created database.
3. Use target-directory import options to point that targetLocation.
you doesn't need need to define warehouse directory.just define hive database it will automatically find out working directory.
sqoop import-all-tables --connect "jdbc:sqlserver://xxx.xxx.x.xxx:xxxx;databaseName=master" --username xxxxxx --password xxxxxxx --hive-import --create-hive-table --hive-database test -m 1
it will just run like rocket.
hope it work for you....

How to store password in password file sqoop

I want to store the password into a file & later use the same in sqoop command.
According to sqoop documentation --password-file option allow us for storing password. so i am storing it in pwd file with password abc text only. & hits the below command.
sqoop import --connect jdbc:mysql://localhost:3306/db --username bhavesh --password-file /pwd --table t1 --target-dir '/erp/test'
assuming pwd file is stored over HDFS
as a result i am getting following error :
java.sql.SQLException: Access denied for user 'bhavesh'#'localhost' (using password: YES)
When I perform the same operation using -p option it works fine for me.
For a saved sqoop job, I was getting the same error.
I stored the password in the metastore and that worked for me.
Making changes to the following configuration property in the file sqoop-site.xml which is usually stored here - /etc/sqoop/conf/sqoop-site.xml
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
<description>If true, allow saved passwords in the metastore.
</description>
</property>
After making these changes, create the sqoop job and running the following command you would be able to see the password stored.
sqoop job --show [job_name]
You can store credential on HDFS.
Create credential using this command:
hadoop credential create mysql.password -provider jceks://hdfs/user/<your_hadoop_username>/mysqlpwd.jceks
When executed on the client machine it will ask to provide a password, then please enter the MySQL password that you were provided with -P option for sqoop command.
sqoop import --connect jdbc:mysql://localhost:3306/db --username bhavesh --password-alias mysql.password --table t1 --target-dir /erp/test
And run this modified command in that I have replaced
--password-file
with
--password-alias
File in hdfs contain a password in an encrypted format that cannot be recovered.

Importing Data In Sqoop 1.99.6

I've been able to import data from Oracle in Sqoop 1.99.6 by creating links and jobs. However, I was wondering if the following syntax can be used to import data:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities
I could only find sqoop.sh file and not sqoop file in /<sqoop_home>/bin directory.
Thanks.
The below syntax can be used to import data from Oracle Database to HDFS using Sqoop
/usr/bin/sqoop import --connect jdbc:oracle:thin:system/system#:1521:xe --username -P--table . --columns "" --target-dir -m 1

hadoop sqoop export table to sql server error

I am new to hadoop and have recently started work on sqoop. While trying to export a table from hadoop to sql server i am getting the following error:
input path does not exist hdfs://sandbox:8020/user/root/
The command i am using is :
sqoop export --connect "jdbc:sqlserver://;username=;password=xxxxx;database=" --table --export-dir /user/root/ -input-fields-terminated-by " "
Could you please guide what i am missing here.
Also could you please let me know the command to navigate to the hadoop directory where the tables are stored.
For a proper sqoop export, Sqoop requires the complete data file location. You cant just specify the root folder.
Try specifying the complete src path
sqoop export --connect jdbc:oracle:thin:<>:1521/<> --username <> --password <> --table <> --export-dir hdfs://<>/user/<>/<> -m 1 --input-fields-terminated-by '|' --input-null-string '\\N' --input-null-non-string '\\N'
Hope this helps

sqoop import multiple tables

We are using Cloudera CDH 4 and we are able to import tables from our Oracle databases into our HDFS warehouse as expected. The problem is we have 10's of thousands of tables inside our databases and sqoop only supports importing one table at a time.
What options are available for importing multiple tables into HDFS or Hive? For example what would be the best way of importing 200 tables from oracle into HDFS or Hive at a time?
The only solution i have seen so far is to create a sqoop job for each table import and then run them all individually. Since Hadoop is designed to work with large dataset it seems like there should be a better way though.
U can use " import-all-tables " option to load all tables into HDFS at one time .
sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop --target-dir '/Sqoop21/AllTables'
if we want to exclude some tables to load into hdfs we can use " --exclude-tables " option
Ex:
sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop --target-dir '/Sqoop21/AllTables' --exclude-tables <table1>,<tables2>
If we want to store in a specified directory then u can use " --warehouse-dir " option
Ex:
sqoop import-all-tables --connect jdbc:mysql://localhost/sqoop --username root --password hadoop --warehouse-dir '/Sqoop'
Assuming that the sqoop configuration for each table is the same, you can list all the tables you need to import and then iterate over them launching sqoop jobs (ideally launch them asynchronously). You can run the following to fetch the list of tables from Oracle:
SELECT owner, table_name FROM dba_tables reference
Sqoop does offer an option to import all tables. Check this link. There are some limitations though.
Modify sqoop source code and recompile it to your needs. The sqoop codebase is well documented and nicely arranged.
--target-dir is not a valid option when using import-all-tables.
To import all tables in particular directory, Use --warehouse-dir instead of --target-dir.
Example:
$ sqoop import-all-tables --connect jdbc:mysql://localhost/movies --username root --password xxxxx --warehouse-dir '/user/cloudera/sqoop/allMoviesTables' -m 1
The best option is do my shell script
Prepare a inputfile which has list of DBNAME.TABLENAME 2)The shell script will have this file as input, iterate line by line and execute sqoop statement for each line.
while read line;
do
DBNAME=`echo $line | cut -d'.' -f1`
tableName=`echo $line | cut -d'.' -f2`
sqoop import -Dmapreduce.job.queuename=$QUEUE_NAME --connect '$JDBC_URL;databaseName=$DBNAME;username=$USERNAME;password=$PASSWORD' --table $tableName --target-dir $DATA_COLLECTOR/$tableName --fields-terminated-by '\001' -m 1
done<inputFile
You can probably import multiple tables : http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
You can use Sqoop "import-all-tables" feature to import all the tables in the database. This also has another parameter, --exclude-tables, along with which you can exclude some of the table that you don't want to import in the database.
Note: --exclude-tables only works with import-all-tables command.
importing multiple tables by sqoop if no of tables are very less.
Create sqoop import for each table as below .
sqoop import --connect jdbc:mysql://localhost/XXXX --username XXXX
password=XXXX
--table XXTABLE_1XX*
sqoop import --connect jdbc:mysql://localhost/XXXX --username XXXX
password=XXXX
--table XXTABLE_2XX*
and so on.
But what if no of tables are 100 or 1000 or even more. Below would be ideal solution.
In such scenario, preparing shell script which takes input from text file containing list of table names to be imported, iterate over, run the scoop import job for each table

Resources