oozie Sqoop action fails to import data to hive - hadoop

I am facing issue while executing oozie sqoop action.
In logs I can see that sqoop is able to import data to temp directory then sqoop creates hive scripts to import data.
It fails while importing temp data to hive.
In logs I am not getting any exception.
Below is a sqoop action I am using.
<workflow-app name="testSqoopLoadWorkflow" xmlns="uri:oozie:workflow:0.4">
<credentials>
<credential name='hive_credentials' type='hcat'>
<property>
<name>hcat.metastore.uri</name>
<value>${HIVE_THRIFT_URL}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${KERBEROS_PRINCIPAL}</value>
</property>
</credential>
</credentials>
<start to="loadSqoopDataAction"/>
<action name="loadSqoopDataAction" cred="hive_credentials">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/tmp/hive-oozie-site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>/tmp/hive-oozie-site.xml</value>
</property>
</configuration>
<command>job --meta-connect ${SQOOP_METASTORE_URL} --exec TEST_SQOOP_LOAD_JOB</command>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
Below is a sqoop Job I am using to import data.
sqoop job --meta-connect ${SQOOP_METASTORE_URL} --create TEST_SQOOP_LOAD_JOB -- import --connect '${JDBC_URL}' --table testTable -m 1 --append --check-column pkId --incremental append --hive-import --hive-table testHiveTable;
In mapred logs I am getting following exception.
72285 [main] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher ends
Please suggest.

This seems like a typical Sqoop import to Hive job. So it seems like Sqoop has successfully imported data in HDFS and is failing to load that data into Hive.
Here's some background on what's happening... Oozie launches a separate job (which will execute on any node in your hadoop cluster) to run the Sqoop command. The Sqoop command starts a separate job to load data into HDFS. Then, at the end of the Sqoop job, sqoop runs a hive script to load that data into Hive.
Since this is theoretically running from any node in your Hadoop cluster, hive CLI will need to be available on each node and talk to the same metastore. The Hive Metastore will need to run in remote mode.
The most normal problem is because Sqoop cannot talk to the correct metastore. The main reasons for this are normally:
Hive metastore service is not running. It should be running in remote mode and a separate service should be started. Here's a quick way to check if its running:
service hive-metastore status
hive-site.xml does not contain hive.metastore.uris. Here's an example hive-site.xml with hive.metastore.uris set:
<configuration>
...
<property>
<name>hive.metastore.uris</name>
<value>thrift://sqoop2.example.com:9083</value>
</property>
...
</configuration>
hive-site.xml is not included in your Sqoop action (or its properties). Try adding your hive-site.xml to a <file> element in your Sqoop action. Here's an example workflow.xml with <file> in it:
<workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
...
<action name="sqoop2hive">
...
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
...
<file>/tmp/hive-site.xml#hive-site.xml</file>
</sqoop>
...
</action>
...
</workflow-app>

This seems to be a bug in Sqoop. Am not sure about the JIRA#. Hortonworks mentioned that the issue is still not resolved even in HDP 2.2 version.

#abeaamase - I want try to use your solution.
Just want to check if below solution works good for sqoop + Hive import in one single oozie job?
...
...
...
/tmp/hive-site.xml#hive-site.xml
...
...

If you are using cdh then problem may be due to hive metastore jar dependency conflicts.

Related

oozie sqoop action fails to import

I am facing issue while executing oozie sqoop action. In logs I can see that sqoop is able to import data to temp directory then sqoop creates hive scripts to import data.
It fails while importing data to hive.
Below is a sqoop action I am using.
<action name="import" retry-max="2" retry-interval="5">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${jobQueue}</value>
</property>
</configuration>
<arg>import</arg>
<arg>-D</arg>
<arg>sqoop.mapred.auto.progress.max=300000</arg>
<arg>-D</arg>
<arg>map.retry.exponentialBackOff=TRUE</arg>
<arg>-D</arg>
<arg>map.retry.numRetries=3</arg>
<arg>--options-file</arg>
<arg>${odsparamFileName}</arg>
<arg>--table</arg>
<arg>${odsTableName}</arg>
<arg>--where</arg>
<arg>${ods_data_pull_column} BETWEEN TO_DATE(${wf:actionData('getDates')['prevMonthBegin']},'YYYY-MM-DD hh24:mi:ss') AND TO_DATE(${wf:actionData('prevMonthEnd')['endDate']},'YYYY-MM-DD hh24:mi:ss')</arg>
<arg>--hive-import</arg>
<arg>--hive-overwrite</arg>
<arg>--hive-table</arg>
<arg>${stgTable}</arg>
<arg>--hive-drop-import-delims</arg>
<arg>--warehouse-dir</arg>
<arg>${sqoopStgDir}</arg>
<arg>--delete-target-dir</arg>
<arg>--null-string</arg>
<arg>\\N</arg>
<arg>--null-non-string</arg>
<arg>\\N</arg>
<arg>--compress</arg>
<arg>--compression-codec</arg>
<arg>gzip</arg>
<arg>--num-mappers</arg>
<arg>1</arg>
<arg>--verbose</arg>
<file>${odsSqoopConnectionParamsFileLocation}</file>
</sqoop>
<ok to="rev"/>
<error to="fail"/>
</action>
Below is the error i am getting in mapred logs
20078 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '1=1' and upper bound '1=1'
Heart beat
Heart beat
Heart beat
Heart beat
151160 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 0 bytes in 135.345 seconds (0 bytes/sec)
151164 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 0 records.
151164 [main] ERROR org.apache.sqoop.tool.ImportTool - Error during import: Import job failed!
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Please suggest
You can import the table to hdfs path using --target-dir and set the location of your hive table to point that path. I fixed it using this approach. Hope it helps you as well.

Running Hive Query in Spark through Oozie 4.1.0.3

Getting table not found exception while running Hive Query in Spark using Oozie version 4.1.0.3, as java action.
Copied hive-site.xml and hive-default.xml from hdfs path
workflow.xml used:
<start to="scala_java"/>
<action name="scala_java">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive- site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive-default.xml</value>
</property>
<property>
<name>pool.name</name>
<value>${etlPoolName}</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${QUEUE_NAME}</value>
</property>
</configuration>
<main-class>org.apache.spark.deploy.SparkSubmit</main-class>
<arg>--master</arg>
<arg>yarn-cluster</arg>
<arg>--class</arg>
<arg>HiveFromSparkExample</arg>
<arg>--deploy-mode</arg>
<arg>cluster</arg>
<arg>--queue</arg>
<arg>testq</arg>
<arg>--num-executors</arg>
<arg>64</arg>
<arg>--executor-cores</arg>
<arg>5</arg>
<arg>--jars</arg>
<arg>datanucleus-api-jdo-3.2.6.jar,datanucleus-core-3.2.10.jar,datanucleus- rdbms-3.2.9.jar</arg>
<arg>TEST-0.0.2-SNAPSHOT.jar</arg>
<file>TEST-0.0.2-SNAPSHOT.jar</file>
</java>
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Table not found test_hive_spark_t1)
Exception in thread "Driver" org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found test_hive_spark_t1
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:79)
at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:255)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:137)
at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:255)
A. The X-default config files are just for user information; they are created at install time, from the hard-coded defaults in the JARs.
It's the X-site config files that contain useful information, e.g. how to connect to the Metastore (default for that is "just start an embedded Derby DB with no data inside"... might explain the "table not found message!
B. Hadoop components search for X-site config files in the CLASSPATH; and if they don't find them there, they silently fallback to default.
So you must tell Oozie to download them to local CWD via <file> instructions.
(Except for an explicit Hive Action that uses another, explicit, convention for its specific hive-site but that's not the case here)
hive-default.xml is not needed.
Create a custom hive-site.xml and which has hive.metastore.uris property alone.
Pass the custom hive-site.xml in --files hive-site.xml as spark Arguments.
Remove the job-xml property and oozie-hive-defaults.

Error Sqoop using Oozie

I have a problem with a workflow oozie
I import a file using Sqoop with the command
sqoop import --connect jdbc:oracle:thin#:ip:sid --username --password --target-dir
From the command line it works but scheduled in oozie gives me the error:
Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Why am getting this error?
Could you add more information to the question? How does oozie workflow.xml look like?
I would guess that you don't have
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
in your workflow.xml file which is resulting in sqoop jar not being found on the classpath. Add that to your workflow.xml and make sure your
/user/oozie/share/lib/
has the sqoop jar. That should fix your issue.

Oozie variable[user] cannot ber resolved

I'm trying to use Oozie's Hive action in Hue. My Hive script is very simple:
create table test.test_2 as
select * from test.test
This Oozie action has only 3 steps:
start
hive_query
end
My job.properties:
jobTracker worker-1:8032
mapreduce.job.user.name hue
nameNode hdfs://batchlayer
oozie.use.system.libpath true
oozie.wf.application.path hdfs://batchlayer/user/hue/oozie/workspaces/_hue_-oozie-4-1425575226.04
user.name hue
I add hive-site.xml two times - as file and as job.xml. Oozie action starts and on second step stops. Job is 'accepted'. But in hue console I've got an error:
variable[user] cannot ber resolved
I'm using Apache Oozie 4.2, Apache Hive 0.14 and Hue 3.7 (from Github).
UPDATE:
This is my workflow.xml:
bash-4.1$ bin/hdfs dfs -cat /user/hue/oozie/workspaces/*.04/work*
<workflow-app name="ccc" xmlns="uri:oozie:workflow:0.4">
<start to="ccc"/>
<action name="ccc">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/user/hue/hive-site.xml</job-xml>
<script>/user/hue/hive_test.hql</script>
<file>/user/hue/hive-site.xml#hive-site.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Tried running a sample hive action in Oozie following similar steps as you, and was able to resolve error faced by you using following steps
Remove the add for hive-site.xml
Add following line to your
job.properties oozie.libpath=${nameNode}/user/oozie/share/lib
Increase visibility of your hive-site.xml file kept in HDFS. Maybe
you have very restrictive privileges over it (in my case 500)
With this both the [user] variable cannot be resolved and subsequent errors got resolved.
Hope it helps.
This message can be really misleading. You should check yarn logs and diagnostics.
In my case it was configuration settings regarding reduce task and container memory. By some error container memory limit was lower than single reduce task memory limit. After looking into yarn application logs I saw the true cause in 'diagnostics' section, which was:
REDUCE capability required is more than the supported max container capability in the cluster. Killing the Job. reduceResourceRequest: <memory:8192, vCores:1> maxContainerCapability:<memory:5413, vCores:4>
Regards

Sqoop - Hive import using Oozie failed

I am trying to execute a sqoop import from oracle to hive, but the job fails with error
WARN [main] conf.HiveConf (HiveConf.java:initialize(2472)) - HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
I have all the jar files in place
hive-site.xml is also in place with hive metastore configuration
<property>
<name>hive.metastore.uris</name>
<value>thrift://sv2lxgsed01.xxxx.com:9083</value>
</property>
I am able to run a sqoop import(using oozie) to HDFS successfully.
I am also able to execute a hive script(using oozie) successfully
I can also execute sqoop-hive import from commandline , but the same
command fails when I execute it using oozie
My workflow.xml is as below
<workflow-app name="WorkflowWithSqoopAction" xmlns="uri:oozie:workflow:0.1">
<start to="sqoopAction"/>
<action name="sqoopAction">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect
jdbc:oracle:thin:#//sv2axcrmdbdi301.xxx.com:1521/DI3CRM --username xxxxxxx --password xxxxxx--table SIEBEL.S_ORG_EXT --hive-table eg.EQX_EG_CRM_S_ORG_EXT --hive-import -m1</command>
<file>/user/oozie/oozieProject/workflowSqoopAction/hive-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="killJob"/>
</action>
<kill name="killJob">
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message>
</kill>
<end name="end" />
</workflow-app>
I can also find the data being loaded in HDFS.
You need to do 2 things
1) Copy hive-site.xml in the oozie workflow directory 2) In your Hive action tell oozie that use my hive-site.xml

Resources