how does --options-file differ from --connection-param-file - hadoop

Sqoop documentation shows example for --options-file as:
#
# Options file for Sqoop import
#
# Specifies the tool being invoked
import
# Connect parameter and value
--connect
jdbc:mysql://localhost/db
# Username parameter and value
--username
foo
#
# Remaining options should be specified in the command line.
#
As per above if it is only the connection information and as per the comment all remaining options should be specified in the command line, why is it in --options-file and not --connection-param-file ?

The comment Remaining options should be specified in the command line is misleading. It is there only to show it is possible to have comments in the options file. However, it does not mean you cannot specify more options.
I am using options file for Sqoop and they contain connection details as well as --num-mappers or --fields-terminated-by.

Related

Sqoop fails with password-file argument

I have a sqoop script which ingests data from SAP HANA to Hive. The sqoop scripts runs fine when I give password as argument "--password Password$$", but to secure the password , I put it in a file called sap.password and used argument"--password-file /dev/configs/sap.password", But the sqoop script returns an execption .
Below is my sqoop script and exception occured:
sqoop import
--connect jdbc:sap://hostname?currentschema=SCHEMA_REF
--driver com.sap.db.jdbc.Driver
--username SERVICE_ACCOUNT
--password-file /dev/configs/sap.password
--table TABLE1
--hive-import
--hive-overwrite
--hive-database cdc_stg
--hive-table HIVE_TABLE1
--as-parquetfile
--m 1
Exception that I get is (I'm sure the credentials are correct):
9/11/14 05:47:08 ERROR manager.SqlManager: Error executing statement:
com.sap.db.jdbc.exceptions.jdbc40.SQLInvalidAuthorizationSpecException: [10]: authentication failed
com.sap.db.jdbc.exceptions.jdbc40.SQLInvalidAuthorizationSpecException: [10]: authentication failed
at com.sap.db.jdbc.exceptions.jdbc40.SQLInvalidAuthorizationSpecException.createException(SQLInvalidAuthorizationSpecException.java:40)
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.createException(SQLExceptionSapDB.java:290)
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.generateDatabaseException(SQLExceptionSapDB.java:174)
at com.sap.db.jdbc.packet.ReplyPacket.buildExceptionChain(ReplyPacket.java:100)
at com.sap.db.jdbc.ConnectionSapDB.execute(ConnectionSapDB.java:1141)
at com.sap.db.jdbc.ConnectionSapDB.execute(ConnectionSapDB.java:888)
at com.sap.db.util.security.AbstractAuthenticationManager.connect(AbstractAuthenticationManager.java:43)
at com.sap.db.jdbc.ConnectionSapDB.openSession(ConnectionSapDB.java:586)
at com.sap.db.jdbc.ConnectionSapDB.doConnect(ConnectionSapDB.java:436)
at com.sap.db.jdbc.ConnectionSapDB.<init>(ConnectionSapDB.java:195)
at com.sap.db.jdbc.ConnectionSapDBFinalize.<init>(ConnectionSapDBFinalize.java:13)
at com.sap.db.jdbc.Driver.connect(Driver.java:255)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:903)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:762)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:785)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:288)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:259)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:245)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:333)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1879)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1672)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:515)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:633)
at org.apache.sqoop.Sqoop.run(Sqoop.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:182)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:233)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:242)
at org.apache.sqoop.Sqoop.main(Sqoop.java:251)
19/11/14 05:47:08 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1678)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:515)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:633)
at org.apache.sqoop.Sqoop.run(Sqoop.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:182)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:233)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:242)
at org.apache.sqoop.Sqoop.main(Sqoop.java:251)
I suspect the password file might be created with newline characters since --password works fine and the only difference or change made is conversion to using a password file.
Can you please re-create the password file using the sqoop docs warning clause stated below.
Reference: SqoopUserGuide
Sqoop will read the entire content of the password file and use it as a password. This will include any trailing white space characters such as newline characters that are added by default by most of the text editors. You need to make sure that your password file contains only characters that belong to your password. On the command line, you can use command echo with switch -n to store password without any trailing white space characters.
Ex: To store password secret use below.
echo -n "secret" > password.file
Also instead of sqoop import try list-databases or list-tables or eval for testing the connection with the password file.
Please check password file permissions. From Sqoop docs:
You should save the password in a file on the users home directory with 400 permissions

Is there a way to execute free form query from a file in sqoop?

Have executed a similar kind of sqoop command as shown below. The free form query mentioned below, I wanted to keep it in a file and run the sqoop command since my real time queries are quite complex and bigger.
Wanted to know, Is there a way to keep the query in a file and execute the sqoop command which will refer the free form query inside the file and execute?
like we do for --password-file case. Thanks in advance.
sqoop import --connect "jdbc:mysql://<localhost>:port" --username "admin" --password-file "<passwordfile>" --query "select * from employee" --split-by employee_id --target-dir "<target directory>" --incremental append --check-column employee_id --last-value 0 --fields-terminated-by "|"
The command line options that are not convenient to put in command, can be read using the Sqoop--options-file argument for convenience, hence you can read the query using the options file. Using options file the Sqoop command should be similar to this:
sqoop import --connect $connect_string --username $username --password $pwd --options-file /home/user/sqoop_poc/query.txt --target-dir $target_dir --m 1
Entry in options file should be like this:
--query
select * from TEST_OPTION where ID <= 10 AND $CONDITIONS
More details on options file are available in Sqoop User Guide.

How to protect password and username in Sqoop?

I want to hide my password that I am using to import data from my RDBMS to Hadoop cluster. I am using --option-files for keeping my password and username in a text file but it's not protected.
Can I do some kind encryption on that particular file for better protection?
Secure way of supplying password to the database.
You should save the password in a file on the users home directory with 400 permissions and specify the path to that file using the --password-file argument, and is the preferred method of entering credentials. Sqoop will then read the password from the file and pass it to the MapReduce cluster using secure means with out exposing the password in the job configuration. The file containing the password can either be on the Local FS or HDFS. For example:
$ sqoop import --connect jdbc:mysql://database.example.com/employees \
--username venkatesh --password-file ${user.home}/.password
Check drill docs for more details.
Also, you can use -P option to Read password from console.
It seems that this question has been addressed previously here,
also described on this hortonworks page and basically consists on creating and .enc file. You also need to configure several parameters like the key to reveal the encryption.
sqoop import \
-Dorg.apache.sqoop.credentials.loader.class=org.apache.sqoop.util.password.CryptoFileLoader \
-Dorg.apache.sqoop.credentials.loader.crypto.passphrase=sqoop2 \
--connect jdbc:mysql://example.com/sqoop \
--username sqoop \
--password-file file:///tmp/pass.enc \
--table tbl
Here are multiple parameters that can be configured (again following the reference):
org.apache.sqoop.credentials.loader.class - Credentials loader
org.apache.sqoop.credentials.loader.crypto.alg – The Algorithm used to decrypt the file (default is AES/ECB/PKCS5Padding).
org.apache.sqoop.credentials.loader.crypto.salt – The salt used to derive a key with the passphrase (default is SALT).
org.apache.sqoop.credentials.loader.crypto.iterations – Number of PBKDF2 iterations (default is 10000).
org.apache.sqoop.credentials.loader.crypto.salt.key.len – Derived key length (default is 128).
org.apache.sqoop.credentials.loader.crypto.passphrase Passphrase used to derive key.
Alternatively you can also follow Sqoop documentation page and create a password alias that gets retrieved with an implementation of CredentialProviderPasswordLoader class. You can see the whole class here

What is the intention of Sqoop options? -single --double munus difference?

In the given example: username followed by one - where as --connect and --table other commands followed by double -- what is the intention of such Sqoop options? Where should I use single and where double?
sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username
root -password password --table tableName --hive-table tableName
--create-hive-table --hive-import --hive-home path/to/hive_home
List item
Generic Hadoop arguments are preceded by a single dash character (-), whereas sqoop arguments start with two dashes (--), unless they are single character arguments such as -P.
Generic hadoop command-line arguments supported are:
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
FYI: You must supply the generic hadoop arguments -conf, -D, and so on after the tool name but before any tool-specific arguments (such as --connect).

Sqoop creating insert statements containing multiple records

we are trying to load the data from sqoop to netezza. And we are facing the following issue.
java.io.IOException: org.netezza.error.NzSQLException: ERROR:
Example Input dataset is as shown below:
1,2,3
1,3,4
sqoop command is as shown below:
sqoop export --table <tablename> --export-dir <path>
--input-fields-terminated-by '\t' --input-lines-terminated-by '\n' --connect
'jdbc:netezza://<host>/<db>' --driver org.netezza.Driver
--username <username> --password <passwrd>
The Sqoop is creating an insert statement in the following way:
insert into (c1,c2,c3) values (1,2,3),(1,3,4).
We are able to load one record but when we try to load the data to multiple records, the error is as said above.
Your help is highly appreciated.
Making sqoop.export.records.per.statement=1 will definitely help but this will make the export process extremely slow if your export record count is very large say "5 Million".
To solve this you need add following things:
1.) A properties file sqoop.properties, it must contain this property jdbc.transaction.isolation=TRANSACTION_READ_UNCOMMITTED (It avoids deadlock during exports)
also in the export command you need to specify this:
--connection-param-file /path/to/sqoop.properties
2.) Also sqoop.export.records.per.statement=100, making this will increase the speed of export.
3.) Third you have to add --batch, Use batch mode for underlying statement execution.
So you final export will look like this,
sqoop export -D sqoop.export.records.per.statement=100 --table <tablename> --export-dir <path>
--input-fields-terminated-by '\t' --input-lines-terminated-by '\n' --connect
'jdbc:netezza://<host>/<db>' --driver org.netezza.Driver
--username <username> --password <passwrd>
--connection-param-file /path/to/sqoop.properties
--batch
Hope this will help.
You can customise the number of rows that will be used in one insert statement with property "sqoop.export.records.per.statement". For example for Netezza you will need to set it to 1:
sqoop export -Dsqoop.export.records.per.statement=1 --connect ...
I would recommend you to also take a look on Apache Sqoop Cookbook where this and many other tips are described.

Resources