Sqoop - problems with --options-file parameter - oracle

I created a file called database.txt which content is:
--connect jdbc:oracle:thin:#//123.123.123.123:123/database --username user --password pass
And when I try to use it in the sqoop query like follows:
sqoop import --options-file database.txt --table
MyTable --as-avrodatafile --null-string
'\N' --null-non-string '\N' --compress --compression-codec
org.apache.hadoop.io.compress.DeflateCodec --target-dir
hdfs:///user/myuser/AvroFiles/table --split-by
SRC --enclosed-by '\"' --map-column-java
CREATED_DATT=String,UPDATED_DATT=String
I get the following error:
17/03/15 11:06:23 ERROR tool.BaseSqoopTool: Error parsing arguments
for import: 17/03/15 11:06:23 ERROR tool.BaseSqoopTool: Unrecognized
argument: --connect jdbc:oracle:thin:#//123.123.123.123:123/database
--username user --password pass 17/03/15 11:06:23 ERROR tool.BaseSqoopTool: Unrecognized argument: --table 17/03/15 11:06:23
ERROR tool.BaseSqoopTool: Unrecognized argument: MyTable 17/03/15
11:06:23 ERROR tool.BaseSqoopTool: Unrecognized argument:
--as-avrodatafile ...
However if I write the connection parameters directly everything works fine:
sqoop import --connect jdbc:oracle:thin:#//123.123.123.123:123/database
--username user --password pass --table MyTable --as-avrodatafile --null-string
'\N' --null-non-string '\N' --compress --compression-codec
org.apache.hadoop.io.compress.DeflateCodec --target-dir
hdfs:///user/myuser/AvroFiles/table --split-by
SRC --enclosed-by '\"' --map-column-java
CREATED_DATT=String,UPDATED_DATT=String
What am I doing wrong?

An options file is a text file where each line identifies an option in the order that it appears otherwise on the command line.
Modify database.txt :
--connect
jdbc:oracle:thin:#//123.123.123.123:123/database
--username
user
--password
pass
Check sqoop documentation for more details.

Related

Can we use Sqoop2 import to create only a file and not into a HIVE table etc

I have tried running below commands in Sqoop2:
This one works wherein TAB-Separated part files (part-m-00000, part-m-00001 etc) were created:
sqoop import --connect jdbc:oracle:thin:#999.999.999.999:1521/SIDNAME --username god --table TABLENAME --fields-terminated-by '\t' --lines-terminated-by '\n' -P
This one fails:
sqoop import -Dmapreduce.job.user.classpath.first=true \
-Dmapreduce.output.basename=`date +%Y-%m-%d` \
--connect jdbc:oracle:thin:#999.999.999.999:1521/SIDNAME \
--username nbkeplo \
--P \
--table TABLENAME \
--columns "COL1, COL2, COL3" \
--target-dir /usr/data/sqoop \
-–as-parquetfile \
-m 10
Error:
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: -–as-parquetfile
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: -m
20/01/08 09:21:23 ERROR tool.BaseSqoopTool: Unrecognized argument: 10
Try --help for usage instructions.
I want the output to be a <.parquet> file and not a HIVE table (want to use with Apache Spark directly without using HIVE). Is this <.parquet> file creation possible with Sqoop import ?
Importing directly to HDFS (as AVRO, SequenceFile, or ) is possible with Sqoop. When you output to Hive, it's still written to HDFS, just inside the Hive warehouse for managed tables. Also, Spark is able to read from any HDFS location it has permission to.
Your code snippets are not the same, and you didn't mention troubleshooting steps you have tried.
I would add the --split-by, --fields-terminated-by, and --lines-terminated-by arguments to your command.
The below works:
sqoop import \
--connect jdbc:oracle:thin:#999.999.999.999:1521/SIDNAME \
--username user \
--target-dir /xxx/yyy/zzz \
--as-parquetfile \
--table TABLE1 \
-P

how do you import oracle table to hive table

I am trying to use sqoop to export oracle table to a hive table:
sqoop import --connect jdbc:oracle:thin:#<server>:1521:<db> --username <user> --password <passwd> --table <table name> --hive-import --hive-table <hive_table_name> -m 1
I keep getting this error.
2018-09-13 10:55:34,825 ERROR tool.ImportTool: Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/<table_name> already exists
I want to import the oracle table to a hive table. What am I a missing here?
Your table already exists on HDFS, you should add target-dir (path on hdfs)
Syntax:
sqoop import --connect jdbc:sqlserver://sqlserver-name \
--username <username> \
--password <password> \
--driver <driver-manager-class> \
--table <table-name> \
--target-dir <target-folder-name>
And then create an external Hive table based on your target-dir
You can use hive-import to import from RDBMS to Hive
sqoop import \
--connect jdbc:mysql://localhost/learning \
--username root --password-file "/Learning/sqoop/.password" \
--table employee -m 1 \
--target-dir /Learning/sqoop/import/employee_hive \
--hive-import \
--hive-table employee.employee_hive
Change the arguments according to your requirement. Also you can make use of --create-hive-table if you want to create a new Hive table.

Error while executing sqoop eval command

I am executing the sqoop command from home dir. This sqoop command are connecting sybase
Sqoop list-tables command is working fine . I am able to see the list of tables
sqoop list-tables \
--connect jdbc:sybase:Tds:omegagold82unsQ:7000/ServiceName=preprod \
--username omega123 \
--password omega1234878 \
--driver com.sybase.jdbc4.jdbc.SybDriver \
but when i execute the below sqoop eval command it throws the below error
sqoop eval \
--connect jdbc:sybase:Tds:omegagold82unsQ:7000/ServiceName=preprod \
--username omega123 \
--password omega1234878 \
--driver com.sybase.jdbc4.jdbc.SybDriver \
--query “SELECT * FROM customer_account LIMIT 3”
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Error parsing arguments for eval:
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Unrecognized argument: records.txt
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Unrecognized argument: sample_json.txt
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Unrecognized argument: sample_simple.txt
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Unrecognized argument: test
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Unrecognized argument: FROM
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Unrecognized argument: customer_account
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Unrecognized argument: LIMIT
17/08/09 19:28:55 ERROR tool.BaseSqoopTool: Unrecognized argument: 3”
Could some one help me on this issue?
I had also faced the same issue and then i replaced the double quote to single of query then it worked for me.
sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root -P --query 'SELECT * FROM categories LIMIT 3'

--mapreduce-name not working with sqoop

When I tried to sqoop the data and in the query when I use
--mapreduce-name both in free form query as well as in normal import, sqoop is giving the generic name for the jar that is QueryResult.jar for free form query for Sqoop import it is giving the tablename as jar which is default.
Why --mapreduce-name is not reflecting. Could anyone help me out with this.
Use -D mapred.job.name=customJobName to set the name of the MR job Sqoop launches.
if not specified, the name defaults to the jar name for the job -
which is derived from the used table name.
Sqoop command syntax:
sqoop import [GENERIC-ARGS] [TOOL-ARGS]
AS per sqoop documentation,
Use -D mapred.job.name=<job_name> to set the name of the MR job that Sqoop launches.
Sample command:
sqoop import \
-D mapred.job.name=my_custom_name \
--connect 'jdbc:db2://192.xxx.xxx.xx:50000/BENCHDS' \
--driver com.ibm.db2.jcc.DB2Driver \
--username db2inst1 \
--password db2inst1 \
--table db2inst1.table1 \
--hive-import \
--hive-table hive_table1 \
--null-string '\\N' \
--null-non-string '\\N' \
--verbose

How to Optimize Sqoop import?

What are the techniques which can be used to optimize sqoop import? I have tried to use split by column to enable parallelism and increased the number of mappers based on the table's data volume. Will changing to Fair Scheduler from FIFO will help? Thanks in advance!
sqoop import -D mapred.job.queue.name=$queuename -D mapred.job.name=$table_SQOOP_INITIAL_LOAD -D java.security.egd=file:/dev/../dev/urandom -D mapred.child.java.opts=" -Djava.security.egd=file:/dev/../dev/urandom" --driver com.teradata.jdbc.TeraDriver --connect jdbc:teradata://${sqoopSourceServer}/DATABASE=${sqoopSchema} --username ${sqoopUsername} --password ${sqoopPassword} --hive-import --hive-overwrite --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --table "$table" --num-mappers 50 --split-by column --target-dir ${hdfsTargetDirectory}$table --hive-table ${hive_database}.$table
I haven't tried but i have read in books
For some databases you can take advantage of the direct mode by using the --direct
parameter:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--table cities \
--direct
Hope this Helps
Below are some of the common performance improvement techniques for Sqoop
split-by and boundary-query
direct
fetch-size
num-mapper
reference link

Resources