Exception during while importing data by using sqoop - hadoop

I am importing data from mysql to hive using command
sqoop import --connect jdbc:mysql://localhost:3306/mydb --username root --table mytable --hive-import
and i am getting this error message:
ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/opt/sqoop/lib/jackson-databind-2.3.1.jar
I am using hadoop 2.6.0 and jdk1.7.0_79.
Please provide me a solution what i have to do to overcome this problem as i have been stuck at this point.
All the steps until importing the data has done successfully.

Related

Sqoop import as ORC ERROR java.io.IOException: HCat exited with status 1

I am trying to import a table from Netezza DB using sqoop hcatlog ( see below) in ORC format as suggested here
Sqoop command:
sqoop import
-m 1
--connect <jdbc_url>
--driver <database_driver>
--connection-manager org.apache.sqoop.manager.GenericJdbcManager
--username <db_username>
--password <db_password>
--table <table_name>
--hcatalog-home /usr/hdp/current/hive-webhcat
--hcatalog-database <hcat_db>
--hcatalog-table < table_name >
--create-hcatalog-table
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")';
However , it failed with following exception. After spending few hours , I have no clue as to why it is failing. Any help / lead is much appreciated.
16/04/21 19:51:22 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: HCat exited with status 1
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.executeExternalHCatProgram(SqoopHCatUtilities.java:1148)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.launchHCatCli(SqoopHCatUtilities.java:1097)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.createHCatTable(SqoopHCatUtilities.java:644)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:340)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:802)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
UPDATE:
I can see that empty table was created though source table has 200k records.
Any suggestion to fix this issue ?

sqoop - connect to oracle and import data to HDFS in IBM BigInsights

i want to connect to my database (oracle 10g) and import data to HDFS.
i am using IBM big Insight Platform.
but when i use below command :
sqoop import --connect jdbc:oracle:thin://<IP>:1521/DB--username xxx --password xxx--table t /lib/sqoop/sqoopout
Got exception running Sqoop:
java.lang.RuntimeException: Could not load db driver class:
oracle.jdbc.OracleDriver
java.lang.RuntimeException: Could not load db driver class:
oracle.jdbc.OracleDriver
at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:286)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:752) at
org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:775) at
org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at
org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at
org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at
org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at
org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645) at
org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at
org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478) at
org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at
org.apache.sqoop.Sqoop.run(Sqoop.java:143) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at
org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at
org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at
org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at
org.apache.sqoop.Sqoop.main(Sqoop.java:236)
i also copy the ojdbc6_g.jar in sqoop/lib.
please help me to solve the problem that i can import data to HDFS.
What version of BigInsights you are using ? Have you loaded the Oracle odbc jar in all the nodes ? Sqoop internally triggers the Map job that will be running from datanodes.
To sqoop data from oracle database first of all you need to download the ojdbc jar and put it into the sqoop lib folder. Link for downloading the OJDBC jar is :
https://mvnrepository.com/artifact/ojdbc/ojdbc/14
https://mvnrepository.com/artifact/com.oracle/ojdbc14/10.2.0.2.0
Apart from that the sqoop command for importing data from ojdbc is :
sqoop import --connect jdbc:oracle:thin:#127.0.0.1:1521:XE --username ***** --password ****** --table table_name --columns "COL1, COL2, COL3, COL4, COL5" --target-dir /xyz/zyx -m 1
Here you can pay attention to the --connect tool, the connection string used has the format:
jdbc:oracle:thin:#ip_address:port_number:SID
The second format that is allowed is:
jdbc:oracle:thin:#ip_address:port_number/service_name
Hope this helps.
P.S. - If you are unable to add the OJDBC jar to sqoop`s lib you can also append the path of Jar file to the $HADOOP_CLASSPATH variable.
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/root/shared_folder/ojdbc6.jar
P.P.S - chmod the ojdbc jar to 777 before execution.

Sqoop job unable to work with Hadoop Credential API

I have stored my database passwords in Hadoop CredentialProvider.
Sqoop import from terminal is working fine, successfully fetching the password from CredentialProvider.
sqoop import
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
--table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay --password-alias db2-dev-password
But when I try to setup as a Sqoop job, it is unable to recognize the -Dhadoop.security.credential.provider.path argument.
sqoop job --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --password-alias
Following is the error message:
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: --password-alias
14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: db2-dev-password
I couldn't find any special instructions in Sqoop User Guide for configuring Hadoop credential API with Sqoop Job.
How to resolve this issue?
Re positioning the Sqoop parameters solve the problem.
sqoop job -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay --password-alias myPasswordAlias
Place the Hadoop credential before the Sqoop job keyword.
Your Sqoop job command is not proper, i.e. --password-alias is incomplete.
Please execute below command in your Hadoop server
hadoop credential list -provider jceks://hdfs/user/vijay/myPassword.jceks
Add the output in below Sqoop job command
sqoop job --create my-sqoop-job -- import --table myTable -m 1 --target-dir /user/vijay/output --delete-target-dir --username vijay -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks --password-alias <<output of above command>>

Sqoop import not working in Hadoop 2.x

I installed Hadoop-2.0.3 and Sqoop-1.4.4 and run Hadoop in pseudo distributed mode. When I try to import table from rdbms to hdfs issuing below command
master#hadoop:~/apps/sqoop-1.4.4$ bin/sqoop import --connect jdbc:mysql://localhost:3306/hadoop --username root --password root --table employees
I get the following error:
14/02/10 05:20:32 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
Can you please provide solution for this?

SQOOP Not able to import table

I am running below command on sqoop
sqoop import --connect jdbc:mysql://localhost/hadoopguide --table widgets
my version of sqoop : Sqoop 1.4.4.2.0.6.1-101
Hadoop -- Hadoop 2.2.0.2.0.6.0-101
Both taken from hortonworks distribution. all the paths like HADOOP_HOME, HCAT_HOME, SQOOP_HOME are set properly. I am able to get list of databases, list of tables from mysql database by running list-database, list-tables commands in sqoop. Even able to get data from --query 'select * from widgets'; but when i use --table option getting below error.
14/02/06 14:02:17 WARN mapred.LocalJobRunner: job_local177721176_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:394)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:233)
at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:236)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:491)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:734)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 13 more
Specify the --bindir where the compiled code and .jar file should be located.
Without these arguments, Sqoop would place the generated Java source file in your current working directory and the compiled .class file and .jar file in /tmp/sqoop-<username>/compile.
Use the --bindir option and point to your current working directory.
sqoop import --bindir ./ --connect jdbc:mysql://localhost/hadoopguide --table widgets
The problem is resolved after i copied the .class file from /tmp/sqoop-hduser/compile/ to hdfs /home/hduser/ and also the current working directory from where i am running sqoop.
For importing a specific table into hdfs, run:
sqoop import --connect jdbc:mysql://localhost/databasename --username root --password *** --table tablename --bindir /usr/lib/sqoop/lib/ --driver com.mysql.jdbc.Driver --target-dir /directory-name
Make sure that /usr/lib/sqoop/* and /usr/local/hadoop/* should be owned by the same user otherwise it will give error like "Permission denied".
PS: Make sure that you have installed mysql-java connector before you run the command. I installed hadoop version 2.7.3 and connector 5.0.8
Another fix for ClassNotFoundException is to tell Hadoop to use the user classpath first (-Dmapreduce.job.user.classpath.first=true). This can be on command line or in Options file. The top of an import Options file would be:
#Options file for Sqoop import
import
-Dmapreduce.job.user.classpath.first=true
This fixed ClassNotFoundException for me when trying to import data as-avrodatafile

Resources