Error Sqoop using Oozie - jdbc

I have a problem with a workflow oozie
I import a file using Sqoop with the command
sqoop import --connect jdbc:oracle:thin#:ip:sid --username --password --target-dir
From the command line it works but scheduled in oozie gives me the error:
Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Why am getting this error?

Could you add more information to the question? How does oozie workflow.xml look like?
I would guess that you don't have
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
in your workflow.xml file which is resulting in sqoop jar not being found on the classpath. Add that to your workflow.xml and make sure your
/user/oozie/share/lib/
has the sqoop jar. That should fix your issue.

Related

oozie java.io.IOException: No FileSystem for scheme: hdfs

I have setup the oozie 4.3.1 with Hadoop 2.7.3.
oozie has been setup and running successfully and able to see web console http://localhost:11000/oozie/
and also confirm using oozie status command.
Issue 1:
While running the oozie examples after changing the job.properties with relevant details getting the error.
nameNode=hdfs://localhost:9000
jobTracker=localhost:8032
bin/oozie job -oozie http://localhost:11000/oozie -config $OOZIE_HOME/examples/apps/map-reduce/job.properties -run
Error: E0902 : E0902: Exception occured: [No FileSystem for scheme: hdfs]
Issue 2: oozie admin -sharelibupdate
[ShareLib update status]
host = http://f091403isdpbato05:11000/oozie
status = java.io.IOException: No FileSystem for scheme: hdfs
hdfs path and other oozie related .xml files also updated with proper configurations.
Please let me know any solution to move ahead.
You can try adding the following to you core-site.xml :
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>

Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server

I am new to oozie and was following this for my first oozie hive job.
As per given in tutorial ,i made following files in a directory:
hive-default.xml
hive_job1.hql
job.properties
workflow.xml
But when i run this command:
oozie job -oozie http://localhost:11000/ -config /home/ec2-user/ankit/oozie_job1/job.properties -submit
I get following error:
Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 404, message: Not Found
I tried finding solution for this on internet ,but none solved the problem.(Might have missed something)
Please let me know where i am going wrong and what additional information will be required more from my side to understand the problem.
The error is because of the incorrect value for -oozie parameter. You forgot to add the oozie in the end. It should be -oozie http://localhost:11000/oozie
oozie job -oozie http://localhost:11000/oozie -config /home/ec2-user/ankit/oozie_job1/job.properties -submit
Please try setting following properties in core-site.xml:
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
where * represents to all users.
Restart the hadoop cluster after making above changes.

Sqoop - Hive import using Oozie failed

I am trying to execute a sqoop import from oracle to hive, but the job fails with error
WARN [main] conf.HiveConf (HiveConf.java:initialize(2472)) - HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
I have all the jar files in place
hive-site.xml is also in place with hive metastore configuration
<property>
<name>hive.metastore.uris</name>
<value>thrift://sv2lxgsed01.xxxx.com:9083</value>
</property>
I am able to run a sqoop import(using oozie) to HDFS successfully.
I am also able to execute a hive script(using oozie) successfully
I can also execute sqoop-hive import from commandline , but the same
command fails when I execute it using oozie
My workflow.xml is as below
<workflow-app name="WorkflowWithSqoopAction" xmlns="uri:oozie:workflow:0.1">
<start to="sqoopAction"/>
<action name="sqoopAction">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect
jdbc:oracle:thin:#//sv2axcrmdbdi301.xxx.com:1521/DI3CRM --username xxxxxxx --password xxxxxx--table SIEBEL.S_ORG_EXT --hive-table eg.EQX_EG_CRM_S_ORG_EXT --hive-import -m1</command>
<file>/user/oozie/oozieProject/workflowSqoopAction/hive-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="killJob"/>
</action>
<kill name="killJob">
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message>
</kill>
<end name="end" />
</workflow-app>
I can also find the data being loaded in HDFS.
You need to do 2 things
1) Copy hive-site.xml in the oozie workflow directory 2) In your Hive action tell oozie that use my hive-site.xml

oozie Sqoop action fails to import data to hive

I am facing issue while executing oozie sqoop action.
In logs I can see that sqoop is able to import data to temp directory then sqoop creates hive scripts to import data.
It fails while importing temp data to hive.
In logs I am not getting any exception.
Below is a sqoop action I am using.
<workflow-app name="testSqoopLoadWorkflow" xmlns="uri:oozie:workflow:0.4">
<credentials>
<credential name='hive_credentials' type='hcat'>
<property>
<name>hcat.metastore.uri</name>
<value>${HIVE_THRIFT_URL}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${KERBEROS_PRINCIPAL}</value>
</property>
</credential>
</credentials>
<start to="loadSqoopDataAction"/>
<action name="loadSqoopDataAction" cred="hive_credentials">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/tmp/hive-oozie-site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>/tmp/hive-oozie-site.xml</value>
</property>
</configuration>
<command>job --meta-connect ${SQOOP_METASTORE_URL} --exec TEST_SQOOP_LOAD_JOB</command>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
Below is a sqoop Job I am using to import data.
sqoop job --meta-connect ${SQOOP_METASTORE_URL} --create TEST_SQOOP_LOAD_JOB -- import --connect '${JDBC_URL}' --table testTable -m 1 --append --check-column pkId --incremental append --hive-import --hive-table testHiveTable;
In mapred logs I am getting following exception.
72285 [main] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher ends
Please suggest.
This seems like a typical Sqoop import to Hive job. So it seems like Sqoop has successfully imported data in HDFS and is failing to load that data into Hive.
Here's some background on what's happening... Oozie launches a separate job (which will execute on any node in your hadoop cluster) to run the Sqoop command. The Sqoop command starts a separate job to load data into HDFS. Then, at the end of the Sqoop job, sqoop runs a hive script to load that data into Hive.
Since this is theoretically running from any node in your Hadoop cluster, hive CLI will need to be available on each node and talk to the same metastore. The Hive Metastore will need to run in remote mode.
The most normal problem is because Sqoop cannot talk to the correct metastore. The main reasons for this are normally:
Hive metastore service is not running. It should be running in remote mode and a separate service should be started. Here's a quick way to check if its running:
service hive-metastore status
hive-site.xml does not contain hive.metastore.uris. Here's an example hive-site.xml with hive.metastore.uris set:
<configuration>
...
<property>
<name>hive.metastore.uris</name>
<value>thrift://sqoop2.example.com:9083</value>
</property>
...
</configuration>
hive-site.xml is not included in your Sqoop action (or its properties). Try adding your hive-site.xml to a <file> element in your Sqoop action. Here's an example workflow.xml with <file> in it:
<workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
...
<action name="sqoop2hive">
...
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
...
<file>/tmp/hive-site.xml#hive-site.xml</file>
</sqoop>
...
</action>
...
</workflow-app>
This seems to be a bug in Sqoop. Am not sure about the JIRA#. Hortonworks mentioned that the issue is still not resolved even in HDP 2.2 version.
#abeaamase - I want try to use your solution.
Just want to check if below solution works good for sqoop + Hive import in one single oozie job?
...
...
...
/tmp/hive-site.xml#hive-site.xml
...
...
If you are using cdh then problem may be due to hive metastore jar dependency conflicts.

SQOOP Not able to import table

I am running below command on sqoop
sqoop import --connect jdbc:mysql://localhost/hadoopguide --table widgets
my version of sqoop : Sqoop 1.4.4.2.0.6.1-101
Hadoop -- Hadoop 2.2.0.2.0.6.0-101
Both taken from hortonworks distribution. all the paths like HADOOP_HOME, HCAT_HOME, SQOOP_HOME are set properly. I am able to get list of databases, list of tables from mysql database by running list-database, list-tables commands in sqoop. Even able to get data from --query 'select * from widgets'; but when i use --table option getting below error.
14/02/06 14:02:17 WARN mapred.LocalJobRunner: job_local177721176_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:394)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:233)
at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:236)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:491)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:734)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 13 more
Specify the --bindir where the compiled code and .jar file should be located.
Without these arguments, Sqoop would place the generated Java source file in your current working directory and the compiled .class file and .jar file in /tmp/sqoop-<username>/compile.
Use the --bindir option and point to your current working directory.
sqoop import --bindir ./ --connect jdbc:mysql://localhost/hadoopguide --table widgets
The problem is resolved after i copied the .class file from /tmp/sqoop-hduser/compile/ to hdfs /home/hduser/ and also the current working directory from where i am running sqoop.
For importing a specific table into hdfs, run:
sqoop import --connect jdbc:mysql://localhost/databasename --username root --password *** --table tablename --bindir /usr/lib/sqoop/lib/ --driver com.mysql.jdbc.Driver --target-dir /directory-name
Make sure that /usr/lib/sqoop/* and /usr/local/hadoop/* should be owned by the same user otherwise it will give error like "Permission denied".
PS: Make sure that you have installed mysql-java connector before you run the command. I installed hadoop version 2.7.3 and connector 5.0.8
Another fix for ClassNotFoundException is to tell Hadoop to use the user classpath first (-Dmapreduce.job.user.classpath.first=true). This can be on command line or in Options file. The top of an import Options file would be:
#Options file for Sqoop import
import
-Dmapreduce.job.user.classpath.first=true
This fixed ClassNotFoundException for me when trying to import data as-avrodatafile

Resources