Cloudera Sqoop Exception, while it's creating a job by the scoop command - sqoop

I have an VM: cloudera-quickstart-vm-5.13.0-0-virtualbox, run now.
I execute the next command:
sudo sqoop --options-file /home/cloudera/sqoop-job/sqoop-job3.txt
Test #1:
The content of file #1 (sqoop-job3.txt) is the next:
job
--create
jobCreateWarehouse_EMP
-- import --connect 'jdbc:oracle:thin:#//localhost:1521/xe' --username 'SYSTEM' --password '123123' --hive-import -e 'SELECT t.* FROM EMP t WHERE $CONDITIONS' --target-dir '/home/cloudera/EMP' --hive-table 'EMP' --split-by 't.EMPNO'
The Exception:
19/06/07 01:02:09 ERROR sqoop.Sqoop: Error while expanding arguments
java.lang.Exception: Malformed option in options file(/home/cloudera/sqoop-job/sqoop-job3.txt): -- import --connect 'jdbc:oracle:thin:#//localhost:1521/xe' --username 'SYSTEM' --password '123123' --hive-import -e 'SELECT t.* FROM EMP t WHERE $CONDITIONS' --target-dir '/home/cloudera/EMP' --hive-table 'EMP' --split-by 't.EMPNO'
at org.apache.sqoop.util.OptionsFileUtil.removeQuoteCharactersIfNecessary(OptionsFileUtil.java:170)
at org.apache.sqoop.util.OptionsFileUtil.removeQuotesEncolosingOption(OptionsFileUtil.java:143)
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:90)
at com.cloudera.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:33)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:215)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Malformed option in options file(/home/cloudera/sqoop-job/sqoop-job3.txt): -- import --connect 'jdbc:oracle:thin:#//localhost:1521/xe' --username 'SYSTEM' --password '123123' --hive-import -e 'SELECT t.* FROM EMP t WHERE $CONDITIONS' --target-dir '/home/cloudera/EMP' --hive-table 'EMP' --split-by 't.EMPNO'
Try 'sqoop help' for usage.
Test #2:
I have changed a content of the file (sqoop-job3.txt) on the next:
job
--create
jobCreateWarehouse_EMP
-- import
--connect
'jdbc:oracle:thin:#//localhost:1521/xe'
--username
'SYSTEM'
--password
'123123'
--hive-import
-e
'SELECT t.* FROM EMP t WHERE $CONDITIONS'
--target-dir
'/home/cloudera/EMP'
--hive-table
'EMP'
--split-by
't.EMPNO'
It shows the next exception:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/06/07 01:10:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Error parsing arguments for job:
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: -- import
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: jdbc:oracle:thin:#//localhost:1521/xe
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: SYSTEM
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --password
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: 123123
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-import
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: -e
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: SELECT t.* FROM EMP t WHERE $CONDITIONS
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --target-dir
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: /home/cloudera/EMP
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: EMP
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: --split-by
19/06/07 01:10:36 ERROR tool.BaseSqoopTool: Unrecognized argument: t.EMPNO
Try --help for usage instructions.
How Can I create an option file to create a job with an import?

There is an Option file have to be created for the import command in the first version of Sqoop. I have created a job in the Hue for an confoguration.

Related

Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a param

I've command to import sql from sqlserver to hive as below
sqoop import --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language --hive-import --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'
But it result
19/02/22 09:10:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
19/02/22 09:10:24 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/02/22 09:10:24 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/02/22 09:10:24 INFO manager.SqlManager: Using default fetchSize of 1000
19/02/22 09:10:24 INFO tool.CodeGenTool: Beginning code generation
19/02/22 09:10:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [daily_language] AS t WHERE 1=0
19/02/22 09:10:25 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.5.0-292/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/ddab816638bd5e65108647177ab703b0/daily_language.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/02/22 09:10:27 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ddab816638bd5e65108647177ab703b0/daily_language.jar
19/02/22 09:10:27 INFO mapreduce.ImportJobBase: Beginning import of daily_language
19/02/22 09:10:29 INFO client.RMProxy: Connecting to ResourceManager at mghdop01.dcdms/10.0.37.157:8050
19/02/22 09:10:29 INFO client.AHSProxy: Connecting to Application History server at mghdop01.dcdms/10.0.37.157:10200
19/02/22 09:10:31 INFO db.DBInputFormat: Using read commited transaction isolation
19/02/22 09:10:31 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([kdbahasa]), MAX([kdbahasa]) FROM [daily_language]
19/02/22 09:10:31 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1547085556146_0680
19/02/22 09:10:31 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:204)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Caused by: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter
at org.apache.sqoop.mapreduce.db.TextSplitter.split(TextSplitter.java:67)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:201)
... 23 more
Why there is
ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: Generating splits for a textual index column
allowed only in case of
"-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed
as a parameter
Although I don't give split-by in the sqoop import above.
First, how I can solve the case above?
Then I try to add "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" in sqoop import above, but it give me another error below;
19/02/22 09:20:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: Dorg.apache.sqoop.splitter.allow_text_splitter=true
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: pbddms
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: -P
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --table
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: daily_language
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-import
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-database
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: test_hive
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: daily_language
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-overwrite
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-drop-import-delims
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --null-string
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: \\N
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --null-non-string
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: \\N\
Second case, how I can solve the case above?
In my case I had to put this part in a double quotes:
sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true"
I takes kdbahasa column as a split-column. Add -m 1 parameter to specify the number of mappers. 1 - means it will run on single mappers without splits:
sqoop import --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language --hive-import -m 1 --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'
Also read this about split_column if you want to split: https://stackoverflow.com/a/37389134/2700344
Faced similar issue found this is how we need to pass that sqoop property:
sqoop import -D org.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language --hive-import -m 1 --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\N' --null-non-string '\N'

Sqoop export from hdfs to GreenPlum is not working

I am trying to export data from hdfs location to Greenplum user defined schema (not default schema).
Tried Sqoop Eval to check the connection.
sqoop eval --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sample_user --password xxxx --query "SELECT * FROM sample_db.sample_table LIMIT 3"
Result:
working fine
Tried with --schema option
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sampleuser --password samplepassword --table sample_table --schema sample_schema --export-dir=/sample/gp_export --input-fields-terminated-by ',' --update-mode allowinsert
Result:
Warning: /usr/hdp/2.3.6.0-3796/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/06/25 11:09:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.6.0-3796
18/06/25 11:09:41 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Error parsing arguments for export:
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --schema
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: sample_schema
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --export-dir=/sample/gp_export
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --input-fields-terminated-by
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: ,
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: --update-mode
18/06/25 11:09:41 ERROR tool.BaseSqoopTool: Unrecognized argument: allowinsert
Added extra '--' before '--schema' based on the sqoop documentation
https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" --username sampleuser --password samplepassword --table sample_table -- --schema sample_schema --export-dir=/sample/gp_export --input-fields-terminated-by ',' --update-mode allowinsert
Result:
Warning: /usr/hdp/2.3.6.0-3796/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/06/25 11:06:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.6.0-3796
18/06/25 11:06:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Export requires an --export-dir argument or --hcatalog-table argument.
Try --help for usage instructions.
Could someone guide me on this. Thanks
Thanks to #cricket_007 for clarification.
--schema argument should be last in the sqoop command. So below code is working.
/usr/bin/sqoop export --connect "jdbc:postgresql://sample.com:5432/sampledb" \
--username sampleuser --password samplepassword \
--export-dir=/sample/gp_export --input-fields-terminated-by ',' \
--table sample_table -- --schema sample_schema
But UPSERT operations not supported in postgresSql. There is an open Jira ticket here.
https://issues.apache.org/jira/browse/SQOOP-1270
After the --export-dir you don't need the = check out the example below. Another suggestion is to use --verbose when you run into these kinds of issues.
sqoop export --libjars /path/some.jar \
--connect 'jdbc:sqlserver://IP:1433;database=db' \
--username someName -password somePassword -m 10 \
--verbose --mysql-delimiters \
--export-dir /HDFS/Path/someFile.csv \
--table "RDBMSTABLENAME"

sqoop job --create getting Errored out

Here I am trying to create a sqoop Job,but its throwing error
FYI : When I tried direct sqoop import it works fine.
But When I need to create a job for this , at that time its showing error
sqoop job --create myjob \
--import \
--connect jdbc:mysql://ip-171-33-113-14:3306/sqooped \
--username squser \
--password ABCD1234 \
--table sac01 \
--m 1
ERROR info
17/05/04 08:59:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.4.0-3485
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Error parsing arguments for job:
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --import
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: jdbc:mysql://ip-171-33-113-14:3306/sqooped
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: squser
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --password
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: ABCD1234
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --table
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: sac01
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: --m
17/05/04 08:59:49 ERROR tool.BaseSqoopTool: Unrecognized argument: 1
Sqoop job syntax:
sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
So, there should be space between -- and import in your command.
Try this:
sqoop job --create myjob \
-- import \
--connect jdbc:mysql://ip-171-33-113-14:3306/sqooped \
--username squser \
--password ABCD1234 \
--table sac01 \
--m 1

ERROR tool.BaseSqoopTool: Error parsing arguments for job: Sqoop i have tried to create a job in sqoop, but the following error occured

sqoop job --create myjob --import --connect "jdbc:mysql://localhost/classicmodels" --username root --password 123 --table customers -m 1 --taget-dir /manoj280217/sqoop
Error:
17/02/28 08:56:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Error parsing arguments for job:
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --import
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --connect
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: jdbc:mysql://localhost/classicmodels
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: root
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --password
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: 123
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --table
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: customers
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: -m
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: 1
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: --taget-dir
17/02/28 08:56:18 ERROR tool.BaseSqoopTool: Unrecognized argument: /manoj280217/sqoop
Syntax for sqoop job is
sqoop-job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
Your command should be
sqoop job --create myjob -- import --connect "jdbc:mysql://localhost/classicmodels" --username root --password 123 --table customers -m 1 --taget-dir /manoj280217/sqoop
See the space between -- and import
this sounds weird but this is how you fix the problem: Make sure you manually type the whole command instead of copy/pasting it. ps also " " are not needed for jdbc
most of the time issue arises only because of spacing and quotations.This is always the case with me. Retype the command and take care of the same.

sqoop import-all-tables : not working

CDH version = 5.5.0-0
Hive process is up & running - No issues
I try to import tables from MySQL into hive using the below script.Tables not importing into Hive.Can you please help me what is the issue or am I missing something?
sqoop import-all-tables \
--num-mappers 1 \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=reatil_dba \
--password=cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--compress \
--compresession-codec org.apache.hadoop.io.compress.SnappyCodec \
--outdir java_files
ERROR:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/12 06:36:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.5.0
16/10/12 06:36:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Error parsing arguments for import-all-tables:
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Unrecognized argument: --compresession-codec
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Unrecognized argument: org.apache.hadoop.io.compress.SnappyCodec
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Unrecognized argument: --outdir
16/10/12 06:36:21 ERROR tool.BaseSqoopTool: Unrecognized argument: java_files
There is a typo, argument name should be --compression-codec instead of --compresession-codec

Resources