Accessing Vertica Database through Oozie sqoop - sqoop

I have written an Oozie workflow to access an HP Vertica database through Sqoop. This is on a Cloudera VM. I am getting the following error in Yarn logs after running:
RROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: dbDriver
java.lang.RuntimeException: Could not load db driver class: dbDriver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.tool.EvalSqlTool.run(EvalSqlTool.java:64)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)"
This is a snippet from the jobprops file:
"dbDriver=com.vertica.jdbc.Driver
dbHost=host***.assist.***
dbName=vertica247
dbPassword=*****
dbPort=5433
dbSchema=simod_chat
dbStagingSchema=simodstg_chat
dbUser=vertica"
What should I specify for --connection-manager? When I run the same workflow outside the VM, it runs without the connection-manager argument?

As the error states:
Could not load db driver class: dbDriver
There are likely two problems:
The JDBC URL is probably incorrect
The JDBC Jar needs to be included in the workflow
For the JDBC URL, make sure it looks like this:
jdbc:vertica://VerticaHost:portNumber/databaseName
For the JDBC jar, it needs to be included with the workflow. Check out this article for a brief example on how to do this with HBase. TLDR: When you run Sqoop through oozie, you have to include the driver in the workflow:
<workflow-app name="sqoop-import" xmlns="uri:oozie:workflow:0.4">
<start to="sqoop-import"/>
<action name="sqoop-import">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:vertica://VerticaHost:portNumber/databaseName --username test --password test --table test</command>
<file>/user/admin/vertica-jdbc.jar#vertica-jdbc.jar</file>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Note the line:
<file>/user/admin/vertica-jdbc.jar#vertica-jdbc.jar</file>
It will automatically be included in your sqoop job.

Related

oozie sqoop action fails to import

I am facing issue while executing oozie sqoop action. In logs I can see that sqoop is able to import data to temp directory then sqoop creates hive scripts to import data.
It fails while importing data to hive.
Below is a sqoop action I am using.
<action name="import" retry-max="2" retry-interval="5">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${jobQueue}</value>
</property>
</configuration>
<arg>import</arg>
<arg>-D</arg>
<arg>sqoop.mapred.auto.progress.max=300000</arg>
<arg>-D</arg>
<arg>map.retry.exponentialBackOff=TRUE</arg>
<arg>-D</arg>
<arg>map.retry.numRetries=3</arg>
<arg>--options-file</arg>
<arg>${odsparamFileName}</arg>
<arg>--table</arg>
<arg>${odsTableName}</arg>
<arg>--where</arg>
<arg>${ods_data_pull_column} BETWEEN TO_DATE(${wf:actionData('getDates')['prevMonthBegin']},'YYYY-MM-DD hh24:mi:ss') AND TO_DATE(${wf:actionData('prevMonthEnd')['endDate']},'YYYY-MM-DD hh24:mi:ss')</arg>
<arg>--hive-import</arg>
<arg>--hive-overwrite</arg>
<arg>--hive-table</arg>
<arg>${stgTable}</arg>
<arg>--hive-drop-import-delims</arg>
<arg>--warehouse-dir</arg>
<arg>${sqoopStgDir}</arg>
<arg>--delete-target-dir</arg>
<arg>--null-string</arg>
<arg>\\N</arg>
<arg>--null-non-string</arg>
<arg>\\N</arg>
<arg>--compress</arg>
<arg>--compression-codec</arg>
<arg>gzip</arg>
<arg>--num-mappers</arg>
<arg>1</arg>
<arg>--verbose</arg>
<file>${odsSqoopConnectionParamsFileLocation}</file>
</sqoop>
<ok to="rev"/>
<error to="fail"/>
</action>
Below is the error i am getting in mapred logs
20078 [main] DEBUG org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - Creating input split with lower bound '1=1' and upper bound '1=1'
Heart beat
Heart beat
Heart beat
Heart beat
151160 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 0 bytes in 135.345 seconds (0 bytes/sec)
151164 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 0 records.
151164 [main] ERROR org.apache.sqoop.tool.ImportTool - Error during import: Import job failed!
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Please suggest
You can import the table to hdfs path using --target-dir and set the location of your hive table to point that path. I fixed it using this approach. Hope it helps you as well.

Errors when using Sqoop action in Oozie editor (Hue)

I am trying to use Sqoop action in Oozie editor in Hue, however I can't get it to work.
Here's what I have tried so far.
I put everything in arguments, instead of command (http://alvincjin.blogspot.com.au/2014/06/create-sqoop-action-in-oozie-using-hue.html)
Further, I am trying to connect to Teradata, so I've placed the jdbc jars in HDFS and have added them in Files.
This is what the current workflow looks like in the editor:
Sqoop Action
.
The workflow definition is:
<workflow-app name="Sqoop_test" xmlns="uri:oozie:workflow:0.5">
<start to="sqoop-b20d"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-b20d">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>development</value>
</property>
<property>
<name>mapred.job.name</name>
<value>test_sqoop</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>0</value>
</property>
</configuration>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:teradata://XXXXX</arg>
<arg>--query</arg>
<arg>select count(*) from XXXXX</arg>
<arg>--fetch-size</arg>
<arg>10000</arg>
<arg>--num-mappers</arg>
<arg>1</arg>
<arg>--hive-table-name</arg>
<arg>XXXXX.tmp_sqoop_test</arg>
<arg>--hive-import</arg>
<arg>--hive-overwrite</arg>
<arg>--target-dir</arg>
<arg>/user/dXXXXX/digital/test/tmp_sqoop_test</arg>
<arg>--username</arg>
<arg>XXXXX</arg>
<arg>--password</arg>
<arg>XXXXX</arg>
<file>/user/hue/oozie/workspaces/digital/lib/terajdbc4.jar#terajdbc4.jar</file>
<file>/user/hue/oozie/workspaces/digital/lib/teradata-connector-1.3.4-hadoop220.jar#teradata-connector-1.3.4-hadoop220.jar</file>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
However, I get this error:
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(296)) - Error parsing arguments for import:
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-table-name
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-table-name
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: XXXXX.tmp_sqoop_test
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: tdcprdr_app_digital.tmp_sqoop_test
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-import
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-import
2786 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --hive-overwrite
2016-01-06 14:13:52,114 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --hive-overwrite
2787 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --target-dir
2016-01-06 14:13:52,115 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(299)) - Unrecognized argument: --target-dir
...
I was under the impression that this error can be resolved by placing everything in arguments.
The same code works when run through a shell script. I've tried placing the import command and connection string in command section, but that doesn't even run. I've tried creating a minimalistic sqoop action as well, with just the query and connect statement as follows:
<workflow-app name="Sqoop_minimal" xmlns="uri:oozie:workflow:0.5">
<start to="sqoop-eeeb"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-eeeb">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:teradata://tdXXXXX</arg>
<arg>--query</arg>
<arg>select count(*) from XXXXX</arg>
<arg>--target-dir</arg>
<arg>/user/dXXXXX/digital/test/tmp_sqoop_test</arg>
<arg>--username</arg>
<arg>XXXXX</arg>
<arg>--password</arg>
<arg>XXXXX</arg>
<file>/user/hue/oozie/workspaces/digital/lib/teradata-connector-1.3.4-hadoop220.jar#teradata-connector-1.3.4-hadoop220.jar</file>
<file>/user/hue/oozie/workspaces/digital/lib/terajdbc4.jar#terajdbc4.jar</file>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
With this workflow, I get a very vague error as follows:
>>> Invoking Sqoop command line now >>>
2287 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-01-06 14:57:48,381 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(175)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2324 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5.3.0.0.0-249
2016-01-06 14:57:48,418 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(92)) - Running Sqoop version: 1.4.5.3.0.0.0-249
2339 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
2016-01-06 14:57:48,433 WARN [main] tool.BaseSqoopTool (BaseSqoopTool.java:applyCredentialsOptions(1014)) - Setting your password on the command-line is insecure. Consider using -P instead.
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
The Oozie version is 4.1.0.3.0.0.0-249.
I've tried searching for a solution online, but no luck.
Any help would be appreciated. Thank you!
Already seen and tried the links:
https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Sqoop-fails-with-quot-Error-parsing-arguments-for-import-quot/td-p/31930
http://stackoverflow.com/questions/25770698/sqoop-free-form-query-causing-unrecognized-arguments-in-hue-oozie
There is no such argumnets for sqoop
--hive-table-name
use
--hive-table. It should not show Unrecognized argument now

Scheduling/running mahout command in oozie

I'm trying to run mahout command - sequence2sparse using oozie scheduler , but it is giving some error.
I tried running the mahout command using oozie - shell tags but nothing worked.
Following are the oozie workflow -
<action name="mahoutSeq2Sparse">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>mahout seq2sparse</exec>
<argument>-i</argument>
<argument>${nameNode}/tmp/Clustering/seqOutput</argument>
<argument>-o</argument>
<argument>${nameNode}/tmp/Clustering/seqToSparse</argument>
<argument>-ow</argument>
<argument>-nv</argument>
<argument>-x</argument>
<argument>100</argument>
<argument>-n</argument>
<argument>2</argument>
<argument>-wt</argument>
<argument>tf</argument>
<capture-output/>
</shell>
<ok to="brandCanopyInitialCluster" />
<error to="fail" />
</action>
I also tried by creating a shell script and run it in oozie
<action name="mahoutSeq2Sparse">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<file>${EXEC}#${EXEC}</file>
</shell>
<ok to="brandCanopyInitialCluster" />
<error to="fail" />
</action>
with job.properties as
nameNode=hdfs://abc02:8020
jobTracker=http://abc02:8050/
clusteringJobInput=hdfs://abc02:8020/tmp/Activity/000000_0
queueName=default
oozie.wf.application.path=hdfs://abc02:8020/tmp/workflow/
oozie.use.system.libpath=true
EXEC=generatingBrandSparseFile.sh
and generatingBrandSparseFile.sh is
export INPUT_PATH="hdfs://abc02:8020/tmp/Clustering/seqOutput"
export OUTPUT_PATH="hdfs://abc02:8020/tmp/Clustering/seqToSparse"
sudo -u hdfs hadoop fs -chmod -R 777 "hdfs://abc02:8020/tmp/Clustering/seqOutput"
mahout seq2sparse -i ${INPUT_PATH} -o ${OUTPUT_PATH} -ow -nv -x 100 -n 2 -wt tf
sudo -u hdfs hadoop fs -chmod -R 777 ${OUTPUT_PATH}
but none of the option is working.
The error with the later one is -
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
sudo: no tty present and no askpass program specified
15/06/05 12:23:59 WARN driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only
15/06/05 12:24:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
For sudo: no tty present this error I have commented out
/etc/sudoers -
Defaults !requiretty
Mahout is installed on the node where oozie server is installed.
Also the following oozie workflow is not valid-
<workflow-app xmlns="uri:oozie:workflow:0.4" name="map-reduce-wf">
<action name="mahoutSeq2Sparse">
<ssh>
<host>rootUserName#abc05.ad.abc.com<host>
<command>mahout seq2sparse</command>
<args>-i</arg>
<args>${nameNode}/tmp/Clustering/seqOutput</arg>
<args>-o</arg>
<args>${nameNode}/tmp/Clustering/seqToSparse</arg>
<args>-ow</args>
<args>-nv</args>
<args>-x</args>
<args>100</args>
<args>-n</args>
<args>2</args>
<args>-wt</args>
<args>tf</args>
<capture-output/>
</ssh>
<ok to="brandCanopyInitialCluster" />
<error to="fail" />
</action>
Error- Error: E0701 : E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'ssh'. One of '{"uri:oozie:workflow:0.4":map-reduce, "uri:oozie:workflow:0.4":pig, "uri:oozie:workflow:0.4":sub-workflow, "uri:oozie:workflow:0.4":fs, "uri:oozie:workflow:0.4":java, WC[##other:"uri:oozie:workflow:0.4"]}' is expected.
Will installing mahout on all the nodes will help?- (oozie can run the script on any node).
Is there a way to make mahout available on hadoop cluster?
Any other solution is also welcome.
Thanks in advance.
Edit:
I have changed the approach slightly, and now I am calling the seq2sparse class directly. The workflow is -
<action name="mahoutSeq2Sparse">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles</main-class>
<arg>-i</arg>
<arg>${nameNode}/tmp/OozieData/Clustering/seqOutput</arg>
<arg>-o</arg>
<arg>${nameNode}/tmp/OozieData/Clustering/seqToSparse</arg>
<arg>-ow</arg>
<arg>-nv</arg>
<arg>-x</arg>
<arg>100</arg>
<arg>-n</arg>
<arg>2</arg>
<arg>-wt</arg>
<arg>tf</arg>
</java>
<ok to="CanopyInitialCluster"/>
<error to="fail"/>
</action>
Still the job is not running , the error is
>>> Invoking Main class now >>>
Main class : org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles
Arguments :
-i
hdfs://abc:8020/tmp/OozieData/Clustering/seqOutput
-o
hdfs://abc:8020/tmp/OozieData/Clustering/seqToSparse
-ow
-nv
-x
100
-n
2
-wt
tf
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, java.lang.IllegalStateException: Job failed!
org.apache.oozie.action.hadoop.JavaMainException: java.lang.IllegalStateException: Job failed!
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Job failed!
at org.apache.mahout.vectorizer.DictionaryVectorizer.startWordCounting(DictionaryVectorizer.java:368)
at org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:179)
at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:288)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:55)
... 15 more
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://vchniecnveg02:8020/user/root/oozie-oozi/0000054-150604142118313-oozie-oozi-W/mahoutSeq2Sparse--java/action-data.seq
Oozie Launcher ends
Those errors on Oozie are very frustrating. From my experience, most of them are produced by a typo in the xml or in the parameter order.
On your last workflow, you didn't close the host tag:
<host>rootUserName#abc05.ad.abc.com<host>
should be
<host>rootUserName#abc05.ad.abc.com</host>
For the shell error, first I recommend to use the version 0.2 (defined here : https://oozie.apache.org/docs/4.0.0/DG_ShellActionExtension.html#AE.A_Appendix_A_Shell_XML-Schema) and to remove all the parameters and everything not useful to start the action (do not care about the results).
You need to use :
<shell xmlns="uri:oozie:shell-action:0.2">

Oozie variable[user] cannot ber resolved

I'm trying to use Oozie's Hive action in Hue. My Hive script is very simple:
create table test.test_2 as
select * from test.test
This Oozie action has only 3 steps:
start
hive_query
end
My job.properties:
jobTracker worker-1:8032
mapreduce.job.user.name hue
nameNode hdfs://batchlayer
oozie.use.system.libpath true
oozie.wf.application.path hdfs://batchlayer/user/hue/oozie/workspaces/_hue_-oozie-4-1425575226.04
user.name hue
I add hive-site.xml two times - as file and as job.xml. Oozie action starts and on second step stops. Job is 'accepted'. But in hue console I've got an error:
variable[user] cannot ber resolved
I'm using Apache Oozie 4.2, Apache Hive 0.14 and Hue 3.7 (from Github).
UPDATE:
This is my workflow.xml:
bash-4.1$ bin/hdfs dfs -cat /user/hue/oozie/workspaces/*.04/work*
<workflow-app name="ccc" xmlns="uri:oozie:workflow:0.4">
<start to="ccc"/>
<action name="ccc">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/user/hue/hive-site.xml</job-xml>
<script>/user/hue/hive_test.hql</script>
<file>/user/hue/hive-site.xml#hive-site.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Tried running a sample hive action in Oozie following similar steps as you, and was able to resolve error faced by you using following steps
Remove the add for hive-site.xml
Add following line to your
job.properties oozie.libpath=${nameNode}/user/oozie/share/lib
Increase visibility of your hive-site.xml file kept in HDFS. Maybe
you have very restrictive privileges over it (in my case 500)
With this both the [user] variable cannot be resolved and subsequent errors got resolved.
Hope it helps.
This message can be really misleading. You should check yarn logs and diagnostics.
In my case it was configuration settings regarding reduce task and container memory. By some error container memory limit was lower than single reduce task memory limit. After looking into yarn application logs I saw the true cause in 'diagnostics' section, which was:
REDUCE capability required is more than the supported max container capability in the cluster. Killing the Job. reduceResourceRequest: <memory:8192, vCores:1> maxContainerCapability:<memory:5413, vCores:4>
Regards

Sqoop - Hive import using Oozie failed

I am trying to execute a sqoop import from oracle to hive, but the job fails with error
WARN [main] conf.HiveConf (HiveConf.java:initialize(2472)) - HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
I have all the jar files in place
hive-site.xml is also in place with hive metastore configuration
<property>
<name>hive.metastore.uris</name>
<value>thrift://sv2lxgsed01.xxxx.com:9083</value>
</property>
I am able to run a sqoop import(using oozie) to HDFS successfully.
I am also able to execute a hive script(using oozie) successfully
I can also execute sqoop-hive import from commandline , but the same
command fails when I execute it using oozie
My workflow.xml is as below
<workflow-app name="WorkflowWithSqoopAction" xmlns="uri:oozie:workflow:0.1">
<start to="sqoopAction"/>
<action name="sqoopAction">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect
jdbc:oracle:thin:#//sv2axcrmdbdi301.xxx.com:1521/DI3CRM --username xxxxxxx --password xxxxxx--table SIEBEL.S_ORG_EXT --hive-table eg.EQX_EG_CRM_S_ORG_EXT --hive-import -m1</command>
<file>/user/oozie/oozieProject/workflowSqoopAction/hive-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="killJob"/>
</action>
<kill name="killJob">
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message>
</kill>
<end name="end" />
</workflow-app>
I can also find the data being loaded in HDFS.
You need to do 2 things
1) Copy hive-site.xml in the oozie workflow directory 2) In your Hive action tell oozie that use my hive-site.xml

Resources