SQOOP Not able to import table - hadoop

I am running below command on sqoop
sqoop import --connect jdbc:mysql://localhost/hadoopguide --table widgets
my version of sqoop : Sqoop 1.4.4.2.0.6.1-101
Hadoop -- Hadoop 2.2.0.2.0.6.0-101
Both taken from hortonworks distribution. all the paths like HADOOP_HOME, HCAT_HOME, SQOOP_HOME are set properly. I am able to get list of databases, list of tables from mysql database by running list-database, list-tables commands in sqoop. Even able to get data from --query 'select * from widgets'; but when i use --table option getting below error.
14/02/06 14:02:17 WARN mapred.LocalJobRunner: job_local177721176_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:394)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:233)
at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:236)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:491)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:734)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 13 more

Specify the --bindir where the compiled code and .jar file should be located.
Without these arguments, Sqoop would place the generated Java source file in your current working directory and the compiled .class file and .jar file in /tmp/sqoop-<username>/compile.

Use the --bindir option and point to your current working directory.
sqoop import --bindir ./ --connect jdbc:mysql://localhost/hadoopguide --table widgets

The problem is resolved after i copied the .class file from /tmp/sqoop-hduser/compile/ to hdfs /home/hduser/ and also the current working directory from where i am running sqoop.

For importing a specific table into hdfs, run:
sqoop import --connect jdbc:mysql://localhost/databasename --username root --password *** --table tablename --bindir /usr/lib/sqoop/lib/ --driver com.mysql.jdbc.Driver --target-dir /directory-name
Make sure that /usr/lib/sqoop/* and /usr/local/hadoop/* should be owned by the same user otherwise it will give error like "Permission denied".
PS: Make sure that you have installed mysql-java connector before you run the command. I installed hadoop version 2.7.3 and connector 5.0.8

Another fix for ClassNotFoundException is to tell Hadoop to use the user classpath first (-Dmapreduce.job.user.classpath.first=true). This can be on command line or in Options file. The top of an import Options file would be:
#Options file for Sqoop import
import
-Dmapreduce.job.user.classpath.first=true
This fixed ClassNotFoundException for me when trying to import data as-avrodatafile

Related

sqoop - connect to oracle and import data to HDFS in IBM BigInsights

i want to connect to my database (oracle 10g) and import data to HDFS.
i am using IBM big Insight Platform.
but when i use below command :
sqoop import --connect jdbc:oracle:thin://<IP>:1521/DB--username xxx --password xxx--table t /lib/sqoop/sqoopout
Got exception running Sqoop:
java.lang.RuntimeException: Could not load db driver class:
oracle.jdbc.OracleDriver
java.lang.RuntimeException: Could not load db driver class:
oracle.jdbc.OracleDriver
at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:286)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:752) at
org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:775) at
org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at
org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at
org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227)
at
org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at
org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645) at
org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at
org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478) at
org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at
org.apache.sqoop.Sqoop.run(Sqoop.java:143) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at
org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at
org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at
org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at
org.apache.sqoop.Sqoop.main(Sqoop.java:236)
i also copy the ojdbc6_g.jar in sqoop/lib.
please help me to solve the problem that i can import data to HDFS.
What version of BigInsights you are using ? Have you loaded the Oracle odbc jar in all the nodes ? Sqoop internally triggers the Map job that will be running from datanodes.
To sqoop data from oracle database first of all you need to download the ojdbc jar and put it into the sqoop lib folder. Link for downloading the OJDBC jar is :
https://mvnrepository.com/artifact/ojdbc/ojdbc/14
https://mvnrepository.com/artifact/com.oracle/ojdbc14/10.2.0.2.0
Apart from that the sqoop command for importing data from ojdbc is :
sqoop import --connect jdbc:oracle:thin:#127.0.0.1:1521:XE --username ***** --password ****** --table table_name --columns "COL1, COL2, COL3, COL4, COL5" --target-dir /xyz/zyx -m 1
Here you can pay attention to the --connect tool, the connection string used has the format:
jdbc:oracle:thin:#ip_address:port_number:SID
The second format that is allowed is:
jdbc:oracle:thin:#ip_address:port_number/service_name
Hope this helps.
P.S. - If you are unable to add the OJDBC jar to sqoop`s lib you can also append the path of Jar file to the $HADOOP_CLASSPATH variable.
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/root/shared_folder/ojdbc6.jar
P.P.S - chmod the ojdbc jar to 777 before execution.

Sqoop Export Oozie Workflow Fails with File Not Found, Works when ran from the console

I have a hadoop cluster with 6 nodes. I'm pulling data out of MSSQL and back into MSSQL via Sqoop. Sqoop import commands work fine, and I can run a sqoop export command from the console (on one of the hadoop nodes). Here's the shell script I run:
SQLHOST=sqlservermaster.local
SQLDBNAME=db1
HIVEDBNAME=db1
BATCHID=
USERNAME="sqlusername"
PASSWORD="password"
sqoop export --connect 'jdbc:sqlserver://'$SQLHOST';username='$USERNAME';password='$PASSWORD';database='$SQLDBNAME'' --table ExportFromHive --columns col1,col2,col3 --export-dir /apps/hive/warehouse/$HIVEDBNAME.db/hivetablename
When I run this command from an oozie workflow, and it's passed the same parameters, I receive the error (when digging into the actual job run logs from the yarn scheduler screen):
**2015-10-01 20:55:31,084 WARN [main] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job init failed
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://hadoopnode1:8020/user/root/.staging/job_1443713197941_0134/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1312)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1080)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://hadoopnode1:8020/user/root/.staging/job_1443713197941_0134/job.splitmetainfo
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1563)
... 17 more**
Has anyone ever seen this and been able to troubleshoot it? It only happens from the oozie workflow. There are similar topics but no one seems to have solved this specific problem.
Thanks!
I was able to solve this problem by setting the user.name property on the job.properties file for the oozie workflow to the user yarn.
user.name=yarn
I think the problem was it did not have permission to create the staging files under /user/root. Once I modified the running user to yarn, the staging files were created under /user/yarn which did have the proper permission.

Exception during while importing data by using sqoop

I am importing data from mysql to hive using command
sqoop import --connect jdbc:mysql://localhost:3306/mydb --username root --table mytable --hive-import
and i am getting this error message:
ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/opt/sqoop/lib/jackson-databind-2.3.1.jar
I am using hadoop 2.6.0 and jdk1.7.0_79.
Please provide me a solution what i have to do to overcome this problem as i have been stuck at this point.
All the steps until importing the data has done successfully.

sqoop to transfer data to HDFS from Teradata

sqoop to transfer data to HDFS from Teradata:
Getting error as below:
-bash-4.1$ sqoop import --connection-manager com.cloudera.sqoop.manager.DefaultManagerFactory --driver com.teradata.jdbc.TeraDriver \
--connect jdbc:teradata://dwsoat.dws.company.co.uk/DATABASE=TS_72258_BASELDB \
--username userid -P --table ADDRESS --num-mappers 3 \
--target-dir /user/nathalok/ADDRESS
Warning: /apps/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/10/29 14:00:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.3
14/10/29 14:00:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/10/29 14:00:14 ERROR sqoop.ConnFactory: Sqoop wasn't able to create connnection manager properly. Some of the connectors supports explicit --driver and some do not. Please try to either specify --driver or leave it out.
14/10/29 14:00:14 ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: java.lang.NoSuchMethodException: com.cloudera.sqoop.manager.DefaultManagerFactory.(java.lang.String, com.cloudera.sqoop.SqoopOptions)
at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:165)
at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:243)
at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:84)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:494)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
Caused by: java.lang.NoSuchMethodException: com.cloudera.sqoop.manager.DefaultManagerFactory.(java.lang.String, com.cloudera.sqoop.SqoopOptions)
at java.lang.Class.getConstructor0(Class.java:2810)
at java.lang.Class.getDeclaredConstructor(Class.java:2053)
at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:151)
... 9 more
-bash-4.1$
Any help will be appreciated.
To get Teradata working properly using a Cloudera distribution, you need to do the following:
Install the Teradata JDBC jars in /var/lib/sqoop. For me these were terajdbc4.jar and tdgssconfig.jar.
Install either Cloudera Connector Powered by Teradata or the Cloudera Connector for Teradata installed somewhere on your filesystem (I prefer /var/lib/sqoop).
In /etc/sqoop/conf/managers.d/, create a file (of any name) and add com.cloudera.connector.teradata.TeradataManagerFactory=<location of connector jar>. For example, I have /etc/sqoop/conf/managers.d/teradata => com.cloudera.connector.teradata.TeradataManagerFactory=/var/lib/sqoop/sqoop-connector-teradata-1.2c5.jar.
There are different ways to install the Teradata connector as well. For example, it may be easier to use Cloudera Manager.
If you're still having trouble, try reaching out to the sqoop mailing list.

Sqoop import not working in Hadoop 2.x

I installed Hadoop-2.0.3 and Sqoop-1.4.4 and run Hadoop in pseudo distributed mode. When I try to import table from rdbms to hdfs issuing below command
master#hadoop:~/apps/sqoop-1.4.4$ bin/sqoop import --connect jdbc:mysql://localhost:3306/hadoop --username root --password root --table employees
I get the following error:
14/02/10 05:20:32 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
Can you please provide solution for this?

Resources