hadoop sqoop error while importing data using options file - sqoop

i am new to hadoop and while practicing sqoop i have got this error message , the command i have used is
i created an import.txt file and in that i used
import --connect jdbc:mysql://localhost/hadoopdb --username hadoop -P and placed this file on HDFS.
while importing i have given this file to the sqoop tool using the --options-file command. so the final command i have given at the command promt is as follows,
sqoop --options-file /user/cloudera/import.txt --table employee
after hiting the enter key i have got the following error message
sqoop --options-file /user/cloudera/import.txt --table employee
13/10/16 13:43:12 ERROR sqoop.Sqoop: Error while expanding arguments
java.lang.Exception: Unable to read options file: /user/cloudera/import.txt
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:102)
at com.cloudera.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:33)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:201)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.io.FileNotFoundException: /user/cloudera/import.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:70)
... 4 more
Unable to read options file: /user/cloudera/import.txt
can anyone tell me why the error is coming.
Thanks in advance.

--option-file path should be local directory. Don't use HDFS directory.
sqoop --options-file /home/cloudera/import.txt --table employee

I got the same issue. I solved it using the following approach.
In the options file you have to mention tools,commands and their arguments line by line
In your case, Your options file "import.txt" should be created like this
$cat > import.txt
import
--connect
jdbc:mysql://localhost/hadoopdb
--username
hadoop
-P
After you created the options file, you can use this to import the table
sqoop --options-file /user/cloudera/import.txt --table employee
Hope this works. Key is you have to mention tools and arguments line by line.
For more understanding on this refer this link
Sqoop User Guide by Apache.org
Correct me if I am wrong.

If you are Calling Sqoop from Oozie and you are facing the same issue -Unable to read options file.
Then you need place the option-file inside workflow location and specify the file in sqoop action-files and also you need to change the permission for that file to chmod 674(When workflow is running in oozie, it will run with sqoop user so it is mandatory to change permission).
This will resolve the error.

I put option file in local directory, It worked.
Also
Argument and value should in different line.
like
--where
'sal > 5000'
and not like
--where 'sal > 5000'

[cloudera#quickstart sqoop]$ sqoop --options-file
/home/cloudera/Desktop/SqoopOptions.txt --table departments --username root --
password cloudera -m 1 --target-dir jan1301
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
No such sqoop tool: import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera. See 'sqoop help'.
I have received above error when I define the SqoopOptions.txt file data in a single line.
The issue resolved when i define each parameter & value in different line like below.

if you are trying on single node cluster, option file can be placed under local file system.
your option file should be like this.
import
--connect
"jdbc:mysql://localhost:3306/sakila"
--username root
-P
for each parameter there should be a next line space.
once you saved the option file then use below command.
sqoop --options-file "your optionfile location" --table abc
hope this should work as this option is perfectly working for me.
Thanks,
Suresh.

Related

Sqoop import job error org.kitesdk.data.ValidationException for Oracle

Sqoop import job for Oracle 11g fails with error
ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.ValidationException: Dataset name
81fdfb8245ab4898a719d4dda39e23f9_C46010.HISTCONTACT is not
alphanumeric (plus '_')
here's the complete command:
$ sqoop job --create ingest_amsp_histcontact -- import --connect "jdbc:oracle:thin:#<IP>:<PORT>/<SID>" --username "c46010" -P --table C46010.HISTCONTACT --check-column ITEM_SEQ --target-dir /tmp/junk/amsp.histcontact -as-parquetfile -m 1 --incremental append
$ sqoop job --exec ingest_amsp_histcontact
it's an incremental import with parquet format. Surprisingly, it works pretty well if I use another format like --as-textfile.
This is similar issue with Sqoop job fails with KiteSDK validation error for Oracle import
But I've used ojdbc6 and switched to ojdbc7 doesn't work as well.
Sqoop version: 1.4.7
Oracle version: 11g
Thanks,
Yusata
I know it is kind of late but I faced the same problem and I solved it by omitting parquet file option.
Try running the job without
-as-parquetfile
There's a workaround, omitting "." character in --table parameter works for me, so instead of --table <schema>.<table_name>, I use --table <table_name>. But this doesn't work if you import a table from another schema in Oracle.
The problem is "." in --target-dir option. Workaround here: Change target dir to "/tmp/junk/amsp_histcontact". When sqoop job finishes, rename the hdfs target dir to "/tmp/junk/amsp.histcontact"

passing mysql properties via sqoop eval

sqoop eval command :
sqoop eval --connect 'jdbc:mysql://<connection url>' --driver com.mysql.jdbc.Driver --query "select max(rdate) from test.sqoop_test"
gives me output:
Warning: /usr/hdp/2.3.2.0-2950/accumulo does not exist! Accumulo
imports will fail. Please set $ACCUMULO_HOME to the root of your
Accumulo installation. Warning: /usr/hdp/2.3.2.0-2950/zookeeper does
not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to
the root of your Zookeeper installation. 16/10/05 18:38:17 INFO
sqoop.Sqoop: Running Sqoop version: 1.4.6.2.3.2.0-2950 16/10/05
18:38:17 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead. 16/10/05 18:38:17
WARN sqoop.ConnFactory: Parameter --driver is set to an explicit
driver however appropriate connection manager is not being set (via
--connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly
which connection manager should be used next time. 16/10/05 18:38:17
INFO manager.SqlManager: Using default fetchSize of 1000
-------------- | max(rdate) |
-------------- | 2014-01-25 |
but i want output without warning and table boundries like:
max(rdate) 2014-01-25
i basically want to store this output to a file.
thanks in advance
You can perform Sqoop Import operation to save output in HDFS.
Warnings are straight forward.
You can set $ACCUMULO_HOME, $ZOOKEEPER_HOME if available.
You can set --connection-manager corresponding to Mysql
For the sake of security,
It's recommended to use -P for password rather than writing in command.
These are not errors, you can live with these warnings.
You can create a .sh file , write your sqoop commands into it, then run it as
shell_file_name.sh > your_output_file.txt
We have two ways to get the query results:
The other way is you can write to HDFS by importing query results(--target-dir /path) and read from there.
You can change the file system option in sqoop command to store the results from import query, So the idea behind is you importing data to local file system rather HDFS.
eg: sqoop import -fs local -jt local --connect "connection string" --username root --password root query "Select * from table" --target-dir /home/output
https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1762587

Sqoop job fails with KiteSDK validation error for Oracle import

I am attempting to run a Sqoop job to load from an Oracle db and into Parquet format to a Hadoop cluster. The job is incremental.
Sqoop version is 1.4.6. Oracle version is 12c. Hadoop version is 2.6.0 (distro is Cloudera 5.5.1).
The Sqoop command is (this creates the job, and executes it):
$ sqoop job -fs hdfs://<HADOOPNAMENODE>:8020 \
--create myJob \
-- import \
--connect jdbc:oracle:thin:#<DBHOST>:<DBPORT>/<DBNAME> \
--username <USERNAME> \
-P \
--as-parquetfile \
--table <USERNAME>.<TABLENAME> \
--target-dir <HDFSPATH> \
--incremental append \
--check-column <TABLEPRIMARYKEY>
$ sqoop job --exec myJob
Error on execute:
16/02/05 11:25:30 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.ValidationException: Dataset name
05112528000000918_2088_<USERNAME>.<TABLENAME>
is not alphanumeric (plus '_')
at org.kitesdk.data.ValidationException.check(ValidationException.java:55)
at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:103)
at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:66)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
at org.kitesdk.data.Datasets.create(Datasets.java:239)
at org.kitesdk.data.Datasets.create(Datasets.java:307)
at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:80)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:106)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:668)
at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Troubleshooting Steps:
0) HDFS is stable, other Sqoop jobs are functional, Oracle source DB is up and the connection has been tested.
1) I tried creating a synonym in Oracle, that way I could simply have the --table option as:
--table TABLENAME (without the username)
This gave me an error that the table name was not correct. It needs the full USERNAME.TABLENAME for the --table option.
Error:
16/02/05 12:04:46 ERROR tool.ImportTool: Imported Failed: There is no column found in the target table <TABLENAME>. Please ensure that your table name is correct.
2) I made sure that this is a Parquet issue. I removed the --as-parquetfile option and the job was successful.
3) I wondered if this is somehow caused by the incremental options. I removed the --incremental append & --check-column options and the job was successful. This confuses me.
4) I tried the job with MySQL and it was successful.
Has anyone run into something similar? Is there a way (or is it even advisable) to disable the Kite validation? It seems that the dataset is being created with dots ("."), which then Kite SDK complains about - but this is an assumption on my part as I am not too familiar with Kite SDK.
Thanks in advance,
Jose
Resolved. There seems to be a known issue with the JDBC connectivity to Oracle 12c. Using a specific OJDBC6 (instead of 7) did the trick. FYI - the OJDBC is installed in /usr/share/java/ and a symbolic link is created in /installpath.../lib/sqoop/lib/
As reported by user #Remya Senan,
breaking the parameter
--hive-table my_hive_db_name.my_hive_table_name
into separate params
--hive-database my_hive_db_name
--hive-table my_hive_table_name
did the trick for me
My environment was
Sqoop v1.4.7
Hive 2.3.3
Tip: I was on emr-5.19.0
I also got this error when I was sqoop importing all tables as parquet file on CHD5.8. By looking at error message I felt this implementation does not support directories with "-" in their name. Based on this understanding I removed "-" from directory name and re-ran the sqoop import command and all worked fine. Hope this helps!

sqoop import issue with mysql

I have a hadoop ha setup based on cdh5.I have tried to import tables from mysql by using sqoop failed with following error.
15/03/20 12:47:53 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#33573e93 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic#33573e93 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
I have used the below command..
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/haddata --username root --password password --table authors --hive-import
My mysql server version is 5.1.73-3. and used 5.1.34 and 5.1.17 version of mysql-connector-java
sqoop version is 1.4.5-cdh5.3.2
Please let me know any suggestion/comments.
Try including the option --driver com.mysql.jdbc.Driver in the import command.
Try using the below modified command, which can suit your purpose
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/haddata --driver com.mysql.jdbc.Driver --username root --password password --table authors --hive-import
follow this link
Include the driver argument --driver com.mysql.jdbc.Driver in sqoop command.
sqoop import --connect jdbc:mysql://<mysql hostname>:3306/<db name> --username **** --password **** --table <table name> --hive-import --driver com.mysql.jdbc.Driver
The --driver parameter forces sqoop to use the latest mysql-connector-java.jar installed for mysql db on the sqoop machine
Try with mysql-connector-java-5.1.31.jar, it is compatable with sqoop 1.4.5.
mysql-connector-java-5.1.17.jar driver does not work with sqoop 1.4.5.
refer :
https://issues.apache.org/jira/browse/SQOOP-1400
If you have com.mysql.jdbc_5.1.5.jar or any version of com.mysql.jdbc_5.X.X.jar file in $HADOOP_HOME/bin folder, then remove that, and execute your SQOOP query.
including the option --driver com.mysql.jdbc.Driver in the import command worked for me.
Sqoop does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop/ directory on the server.
Note:
The JDBC drivers need to be installed only on the machine where Sqoop runs. You do not need to install them on all hosts in your Hadoop cluster.
You can download driver from here : https://dev.mysql.com/downloads/connector/j/5.1.html
Try the exact command as like below.
sqoop import --connect "jdbc:mysql://localhost:3306/books"
--username=root --password=root --table authors --as-textfile --target-dir=/datasqoop/authors_db --columns "id, name, email" --split-by id --driver com.mysql.jdbc.Driver
This will resolve your issues.
Find the jar locations that are being used in sqoop, in my case, it is pointing to the link /usr/share/java/mysql-connector-java.jar
so when I check the link /usr/share/java/mysql-connector-java.jar it points to mysql-connector-java-5.1.17.jar
/usr/share/java/mysql-connector-java.jar -> mysql-connector-java-5.1.17.jar
as 5.1.17 is having this issue, try 5.1.37 or higher.
unlink /usr/share/java/mysql-connector-java.jar
ln -s /usr/share/java/mysql-connector-java.jar /usr/share/java/mysql-connector-java-5.1.37.jar

I couldn't import the tables from my sql server to hive through sqoop

When I pass the command:
$sqoop create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433;username=cloud;password=cloud123;database=hadoop' --table cluster
Some errors and warnings appear and at the end it says,
Failed to start database '/var/lib/hive/metastore/metastore_db', see the next exception for details [again a list of import errors displayed]
Finally it says hive exited with satus 9
What is the problem here? I am new to sqoop and hive. Please anyone help me.
The correct syntax would be
sqoop import --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster --hive-import
I think you might want to check if you have write permissions to the specified directory and if a directory named metastore_db is being created
This message is usually shown when you're running Sqoop with default Hive configuration. Hive will by default use derby datastore which is usable only in very basic test use cases. I would recommend to reconfigure your hive instance to use some other relation database as a datastore back end (MySQL, PostgreSQL, Oracle).
Your syntax is all wrong. Syntax is $sqoop tool-name [tool-arguments]
$sqoop import --create-hive-table --connect 'jdbc:sqlserver://10.100.0.18:1433/hadoop' --username cloud --password cloud123 --table cluster
Pasting a sample call of hive import using sqoop. This might help you to correct your syntax further. Remember that essentially you need to give minimum the below command to make it work.
sqoop import --connect jdbc:mysql://localhost/RAWDATA --table geolocation --username root --password hadoop --hive-import --create-hive-table --driver com.mysql.jdbc.Driver --m 1 --delete-target-dir
--connect, in this the part which reads /RAWDATA is the database name from your mysql instance which contains the geolocation table. You can execute 'show databases' and 'show tables' command in mysql to check for your databases and tables.
--delete-target-dir option is used for safety. It will ensure sqoop delete the tmp dir it creates to write the file before moving it into hive. This will avoid unnecessary errors of directory already exists, in case you retry the command.
--create-hive-table is required only if you did not create the target table in hive already. If your previous runs of sqoop command created the table already, then you can ignore this option completely. Check your hive database for existence of target hive table.
--driver is a mandatory part of the command to perform any database connection.Make sure you either find the right path to the driver library or try googling for options. You can try first the one pasted above to see if it does the trick. You can revert to this forum for help.
remember we did not mention which database in hive the table will be created therefore it will be in default database of hive. I am not giving that option since you are just about starting in sqoop.

Resources