Sqoop job unable to work with Hadoop Credential API - hadoop

I have stored my database passwords in Hadoop CredentialProvider.
Sqoop import from terminal is working fine, successfully fetching the password from CredentialProvider.
sqoop import
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
--table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay --password-alias db2-dev-password
But when I try to setup as a Sqoop job, it is unable to recognize the -Dhadoop.security.credential.provider.path argument.
sqoop job --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --password-alias
Following is the error message:
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: --password-alias
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: db2-dev-password
I couldn't find any special instructions in Sqoop User Guide for configuring Hadoop credential API with Sqoop Job.
How to resolve this issue?

Re positioning the Sqoop parameters solve the problem.
sqoop job -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay --password-alias myPasswordAlias
Place the Hadoop credential before the Sqoop job keyword.

Your Sqoop job command is not proper, i.e. --password-alias is incomplete.
Please execute below command in your Hadoop server
hadoop credential list -provider jceks://hdfs/user/vijay/myPassword.jceks
Add the output in below Sqoop job command
sqoop job --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --password-alias <<output of above command>>

Related

SemanticException 10072: database does not exist (Sqoop)

I created a hive internal table using sqoop command.
sqoop import -Dmapreduce.map.memory.mb=4096
--driver com.mysql.jdbc.Driver
--connect 'jdbc:mysql://{mysql_url}'
--username 'xxxx'
--password 'xxxx'
--input-fields-terminated-by '\t'
--split-by id
--target-dir {hdfs_path}
--verbose -m 1
--hive-drop-import-delims
--fields-terminated-by '\t'
--hive-import
--hive-table '{table_name}'
--query "select id from temp WHERE \$CONDITIONS LIMIT 10"
I created a table into it and it was working find.
19/01/06 19:33:44 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://hadoop/{hdfs_path}' INTO TABLE `tmp.temp`
19/01/06 19:33:44 INFO hive.HiveImport: Loading uploaded data into Hive
19/01/06 19:33:44 DEBUG hive.HiveImport: Using in-process Hive instance.
19/01/06 19:33:44 DEBUG util.SubprocessSecurityManager: Installing subprocess security manager
Logging initialized using configuration in jar:file:${HADOOP_HOME}/hive-1.1.0-cdh5.14.2/lib/hive-common-1.1.0-cdh5.14.2.jar!/hive-log4j.properties
It created in hdfs warehouse location.
$ hadoop dfs -ls {hdfs_path}
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
19/01/06 19:43:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
0 2019-01-06 19:33 {hdfs_path}/_SUCCESS
65 2019-01-06 19:33 {hdfs_path}/part-m-00000.gz
BUT it was an error:
FAILED: SemanticException [Error 10072]: Database does not exist: tmp
I already hive-site.xml into sqoop conf directory.
cp ${HIVE_HOME}/conf/hive-site.xml ${SQOOP_HOME}/conf/hive-site.xml
"hive.metastore.uris" was set local and remote thrift.
How do I do? Help me. Thanks
Please used this sqoop command to import data from sqoop to hive with existing table and change clause according to your requirement.i have just modified some clause and added some clause as per requirement in your command
ubuntu#localhost:/usr/local/hive$ sqoop import -Dmapreduce.map.memory.mb=4096 --connect 'jdbc:mysql://localhost/test' --username 'root' -P --input-fields-terminated-by ',' --split-by id --target-dir /user/hive/warehouse/test_hive --hive-drop-import-delims --fields-terminated-by ',' --hive-import --hive-database default --hive-table test_hive --query "select id from test WHERE \$CONDITIONS LIMIT 10" --driver com.mysql.jdbc.Driver --delete-target-dir
Happy Hadooppppppppp

Sqoop export from hdfs to GreenPlum is not working

I am trying to export data from hdfs location to Greenplum user defined schema (not default schema).
Tried Sqoop Eval to check the connection.
sqoop eval --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sample_user --password xxxx --query "SELECT * FROM sample_db.sample_table LIMIT 3"
Result:
working fine
Tried with --schema option
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sampleuser --password samplepassword --table sample_table --schema sample_schema --export-dir=/sample/gp_export --input-fields-terminated-by ',' --update-mode allowinsert
Result:
Warning: /usr/hdp/2.3.6.0-3796/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/06/25 11:09:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.6.0-3796
18/06/25 11:09:41 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Error parsing arguments for export:
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --schema
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: sample_schema
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --export-dir=/sample/gp_export
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --input-fields-terminated-by
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: ,
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --update-mode
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: allowinsert
Added extra '--' before '--schema' based on the sqoop documentation
https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sampleuser --password samplepassword --table sample_table -- --schema sample_schema --export-dir=/sample/gp_export --input-fields-terminated-by ',' --update-mode allowinsert
Result:
Warning: /usr/hdp/2.3.6.0-3796/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/06/25 11:06:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.6.0-3796
18/06/25 11:06:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Export requires an --export-dir argument or --hcatalog-table argument.
Try --help for usage instructions.
Could someone guide me on this. Thanks
Thanks to #cricket_007 for clarification.
--schema argument should be last in the sqoop command. So below code is working.
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" \
--username sampleuser --password samplepassword \
--export-dir=/sample/gp_export --input-fields-terminated-by ',' \
--table sample_table -- --schema sample_schema
But UPSERT operations not supported in postgresSql. There is an open Jira ticket here.
https://issues.apache.org/jira/browse/SQOOP-1270
After the --export-dir you don't need the = check out the example below. Another suggestion is to use --verbose when you run into these kinds of issues.
sqoop export --libjars /path/some.jar \
--connect 'jdbc:sqlserver://IP:1433;database=db' \
--username someName -password somePassword -m 10 \
--verbose --mysql-delimiters \
--export-dir /HDFS/Path/someFile.csv \
--table "RDBMSTABLENAME"

Getting Error When trying to Import Data From Oracle to HIVE

I am running below command to import data from Oracle Db to HIVE
sqoop import --connect jdbc:oracle:thin:#//localhost:1521/newDB --username <USERNAME> --P --table <ORACLE_TABLE_NAME> --hive-table <HIVE_TABLE_NAME> --hive-import -m 1
I am getting below Error when i am running this Query
17/11/21 05:05:46 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
17/11/21 05:05:46 INFO manager.SqlManager: Using default fetchSize of 1000
17/11/21 05:05:46 INFO tool.CodeGenTool: Beginning code generation
17/11/21 05:05:47 INFO manager.OracleManager: Time zone has been set to GMT
17/11/21 05:05:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "<TABLE_NAME>" t WHERE 1=0
17/11/21 05:05:47 ERROR tool.ImportTool: Imported Failed: There is no column found in the target table <TABLE_NAME>. Please ensure that your table name is correct.
Check the owner of table and try using table_owner.table_name for oracle source.
I find you are not used --create-hive-table and few other parameter in your query.
Below is the sqoop import query i use in my project:
oracle_connection.txt will have the connection info.
sqoop --options-file oracle_connection.txt \
--table $DATABASE.$TABLENAME \
-m $NUMMAPPERS \
--where "$CONDITION" \
--hive-import \
--map-column-hive "$COLLIST" \
--create-hive-table \
--hive-drop-import-delims \
--split-by $SPLITBYCOLUMN \
--hive-table $HIVEDATABASE.$TABLENAME \
--bindir sqoop_hive_rxhome/bindir/ \
--outdir sqoop_hive_rxhome/outdir

sqoop import error from teradata to hive

I am using below given sqoop command:
sqoop import
--libjars /usr/hdp/2.4.0.0-169/sqoop/lib,/usr/hdp/2.4.0.0-169/hive/lib
--connect jdbc:teradata://x/DATABASE=x
--connection-manager org.apache.sqoop.teradata.TeradataConnManager
--username ec
--password dc
--query "select * from hb where yr_nbr=2017"
--hive-table schema.table
--num-mappers 1
--hive-import
--target-dir /user/hive/warehouse/GG
I'm getting this error:
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
17/04/06 11:15:41 INFO mapreduce.Job: map 100% reduce 0%
17/04/06 11:15:41 INFO mapreduce.Job: Task Id : attempt_1491466460468_0029_m_000000_1, Status : FAILED
Error: org.apache.hadoop.fs.FileAlreadyExistsException: /user/root/temp_111508/part-m-00000 for client 192.168.211.133 already exists
From the error, I can guess that the output file is already in your target directory, may be from your previous sqoop import. There is an option in sqoop import named --delete-target-dir which will delete your target output directory and re-create them in your next sqoop import. Hope that helps.

Apache Sqoop import error

I am getting the following error while trying to import table from MySql to HDFS. I am using cloudera CDH 4 VM.
[cloudera#localhost ~]$ sqoop import --connect jdbc:mysql://localhost/mydatabase\
> --user root\
> --password aaaaaaaa\
> --table bike
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/12/08 11:44:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.3-cdh4.7.0
14/12/08 11:44:41 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
14/12/08 11:44:41 ERROR tool.BaseSqoopTool: Unrecognized argument: root--password
14/12/08 11:44:41 ERROR tool.BaseSqoopTool: Unrecognized argument: aaaaaaaa--table
14/12/08 11:44:41 ERROR tool.BaseSqoopTool: Unrecognized argument: bike
Try --help for usage instructions.
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
try using this
sqoop import --connect jdbc:mysql://mysql_server:3306/mydatabase --username root --password aaaaaaaa --table bike ;
you shoud use --username instead of --user

Resources