Sqoop export from hdfs to GreenPlum is not working - hadoop

I am trying to export data from hdfs location to Greenplum user defined schema (not default schema).
Tried Sqoop Eval to check the connection.
sqoop eval --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sample_user --password xxxx --query "SELECT * FROM sample_db.sample_table LIMIT 3"
Result:
working fine
Tried with --schema option
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sampleuser --password samplepassword --table sample_table --schema sample_schema --export-dir=/sample/gp_export --input-fields-terminated-by ',' --update-mode allowinsert
Result:
Warning: /usr/hdp/2.3.6.0-3796/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/06/25 11:09:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.6.0-3796
18/06/25 11:09:41 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Error parsing arguments for export:
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --schema
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: sample_schema
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --export-dir=/sample/gp_export
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --input-fields-terminated-by
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: ,
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --update-mode
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: allowinsert
Added extra '--' before '--schema' based on the sqoop documentation
https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sampleuser --password samplepassword --table sample_table -- --schema sample_schema --export-dir=/sample/gp_export --input-fields-terminated-by ',' --update-mode allowinsert
Result:
Warning: /usr/hdp/2.3.6.0-3796/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/06/25 11:06:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.6.0-3796
18/06/25 11:06:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Export requires an --export-dir argument or --hcatalog-table argument.
Try --help for usage instructions.
Could someone guide me on this. Thanks

Thanks to #cricket_007 for clarification.
--schema argument should be last in the sqoop command. So below code is working.
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" \
--username sampleuser --password samplepassword \
--export-dir=/sample/gp_export --input-fields-terminated-by ',' \
--table sample_table -- --schema sample_schema
But UPSERT operations not supported in postgresSql. There is an open Jira ticket here.
https://issues.apache.org/jira/browse/SQOOP-1270

After the --export-dir you don't need the = check out the example below. Another suggestion is to use --verbose when you run into these kinds of issues.
sqoop export --libjars /path/some.jar \
--connect 'jdbc:sqlserver://IP:1433;database=db' \
--username someName -password somePassword -m 10 \
--verbose --mysql-delimiters \
--export-dir /HDFS/Path/someFile.csv \
--table "RDBMSTABLENAME"

Related

Cloudera Sqoop Exception, while it's creating a job by the scoop command

I have an VM: cloudera-quickstart-vm-5.13.0-0-virtualbox, run now.
I execute the next command:
sudo sqoop --options-file /home/cloudera/sqoop-job/sqoop-job3.txt
Test #1:
The content of file #1 (sqoop-job3.txt) is the next:
job
--create
jobCreateWarehouse_EMP
-- import --connect 'jdbc:oracle:thin:#//localhost:1521/xe' --username 'SYSTEM' --password '123123' --hive-import -e 'SELECT t.* FROM EMP t WHERE $CONDITIONS' --target-dir '/home/cloudera/EMP' --hive-table 'EMP' --split-by 't.EMPNO'
The Exception:
19/06/07 01:02:09 ERROR sqoop.Sqoop: Error while expanding arguments
java.lang.Exception: Malformed option in options file(/home/cloudera/sqoop-job/sqoop-job3.txt): -- import --connect 'jdbc:oracle:thin:#//localhost:1521/xe' --username 'SYSTEM' --password '123123' --hive-import -e 'SELECT t.* FROM EMP t WHERE $CONDITIONS' --target-dir '/home/cloudera/EMP' --hive-table 'EMP' --split-by 't.EMPNO'
at org.apache.sqoop.util.OptionsFileUtil.removeQuoteCharactersIfNecessary(OptionsFileUtil.java:170)
at org.apache.sqoop.util.OptionsFileUtil.removeQuotesEncolosingOption(OptionsFileUtil.java:143)
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:90)
at com.cloudera.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:33)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:215)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Malformed option in options file(/home/cloudera/sqoop-job/sqoop-job3.txt): -- import --connect 'jdbc:oracle:thin:#//localhost:1521/xe' --username 'SYSTEM' --password '123123' --hive-import -e 'SELECT t.* FROM EMP t WHERE $CONDITIONS' --target-dir '/home/cloudera/EMP' --hive-table 'EMP' --split-by 't.EMPNO'
Try 'sqoop help' for usage.
Test #2:
I have changed a content of the file (sqoop-job3.txt) on the next:
job
--create
jobCreateWarehouse_EMP
-- import
--connect
'jdbc:oracle:thin:#//localhost:1521/xe'
--username
'SYSTEM'
--password
'123123'
--hive-import
-e
'SELECT t.* FROM EMP t WHERE $CONDITIONS'
--target-dir
'/home/cloudera/EMP'
--hive-table
'EMP'
--split-by
't.EMPNO'
It shows the next exception:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/06/07 01:10:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Error parsing arguments for job:
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: -- import
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: jdbc:oracle:thin:#//localhost:1521/xe
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: SYSTEM
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --password
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: 123123
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-import
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: -e
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: SELECT t.* FROM EMP t WHERE $CONDITIONS
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --target-dir
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: /home/cloudera/EMP
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: EMP
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --split-by
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: t.EMPNO
Try --help for usage instructions.
How Can I create an option file to create a job with an import?
There is an Option file have to be created for the import command in the first version of Sqoop. I have created a job in the Hue for an confoguration.

sqoop job --create getting Errored out

Here I am trying to create a sqoop Job,but its throwing error
FYI : When I tried direct sqoop import it works fine.
But When I need to create a job for this , at that time its showing error
sqoop job --create myjob \
--import \
--connect jdbc:mysql://ip-171-33-113-14:3306/sqooped \
--username squser \
--password ABCD1234 \
--table sac01 \
--m 1
ERROR info
17/05/04 08:59:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.4.0-3485
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Error parsing arguments for job:
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --import
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: jdbc:mysql://ip-171-33-113-14:3306/sqooped
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: squser
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --password
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: ABCD1234
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --table
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: sac01
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --m
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: 1
Sqoop job syntax:
sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
So, there should be space between -- and import in your command.
Try this:
sqoop job --create myjob \
-- import \
--connect jdbc:mysql://ip-171-33-113-14:3306/sqooped \
--username squser \
--password ABCD1234 \
--table sac01 \
--m 1

sqoop import-all-tables : not working

CDH version = 5.5.0-0
Hive process is up & running - No issues
I try to import tables from MySQL into hive using the below script.Tables not importing into Hive.Can you please help me what is the issue or am I missing something?
sqoop import-all-tables \
--num-mappers 1 \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=reatil_dba \
--password=cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--compress \
--compresession-codec org.apache.hadoop.io.compress.SnappyCodec \
--outdir java_files
ERROR:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/12 06:36:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.5.0
16/10/12 06:36:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Error parsing arguments for import-all-tables:
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Unrecognized argument: --compresession-codec
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Unrecognized argument: org.apache.hadoop.io.compress.SnappyCodec
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Unrecognized argument: --outdir
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Unrecognized argument: java_files
There is a typo, argument name should be --compression-codec instead of --compresession-codec

Sqoop job unable to work with Hadoop Credential API

I have stored my database passwords in Hadoop CredentialProvider.
Sqoop import from terminal is working fine, successfully fetching the password from CredentialProvider.
sqoop import
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
--table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay --password-alias db2-dev-password
But when I try to setup as a Sqoop job, it is unable to recognize the -Dhadoop.security.credential.provider.path argument.
sqoop job --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --password-alias
Following is the error message:
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: --password-alias
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: db2-dev-password
I couldn't find any special instructions in Sqoop User Guide for configuring Hadoop credential API with Sqoop Job.
How to resolve this issue?
Re positioning the Sqoop parameters solve the problem.
sqoop job -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay --password-alias myPasswordAlias
Place the Hadoop credential before the Sqoop job keyword.
Your Sqoop job command is not proper, i.e. --password-alias is incomplete.
Please execute below command in your Hadoop server
hadoop credential list -provider jceks://hdfs/user/vijay/myPassword.jceks
Add the output in below Sqoop job command
sqoop job --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --password-alias <<output of above command>>

Apache Sqoop import error

I am getting the following error while trying to import table from MySql to HDFS. I am using cloudera CDH 4 VM.
[cloudera#localhost ~]$ sqoop import --connect jdbc:mysql://localhost/mydatabase\
> --user root\
> --password aaaaaaaa\
> --table bike
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/12/08 11:44:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.3-cdh4.7.0
14/12/08 11:44:41 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
14/12/08 11:44:41 ERROR tool.BaseSqoopTool: Unrecognized argument: root--password
14/12/08 11:44:41 ERROR tool.BaseSqoopTool: Unrecognized argument: aaaaaaaa--table
14/12/08 11:44:41 ERROR tool.BaseSqoopTool: Unrecognized argument: bike
Try --help for usage instructions.
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
try using this
sqoop import --connect jdbc:mysql://mysql_server:3306/mydatabase --username root --password aaaaaaaa --table bike ;
you shoud use --username instead of --user

Resources