library packages not working with oozie - shell

Hi i am running oozie with shell script. In that shell script i am using sparkR jobs.whenever running oozie jobs i am getting error with library.
here is my error.
Stdoutput Running /opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/spark/bin/spark-submit --class edu.berkeley.cs.amplab.sparkr.SparkRRunner --files pi.R --master yarn-client /SparkR-pkg/lib/SparkR/sparkr-assembly-0.1.jar pi.R yarn-client 4
Stdoutput Error in library(SparkR) : there is no package called ‘SparkR’
Stdoutput Execution halted
Exit code of the Shell command 1
<<< Invocation of Shell command completed <<<
my job.properties file
nameNode=hdfs://ip-172-31-41-199.us-west-2.compute.internal:8020
jobTracker=ip-172-31-41-199.us-west-2.compute.internal:8032
queueName=default
oozie.libpath=hdfs://ip-172-31-41-199.us-west- 2.compute.internal:8020/SparkR-pkg/lib/
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=shell_example
oozie.wf.application.path=${oozieProjectRoot}/apps/shell
my workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.1" name="Test">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>script.sh</exec>
<file>oozie-oozi/script.sh#script.sh</file>
<file>/user/karun/examples/pi.R</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Incorrect output</message>
</kill>
<end name="end"/>
</workflow-app>
my shellscript file
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/spark
export YARN_CONF_DIR=/etc/hadoop/conf
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export HADOOP_CMD=/usr/bin/hadoop
/SparkR-pkg/lib/SparkR/sparkR-submit --master yarn-client pi.R yarn-client 4
I don't know how to resolve the issue.any help will be appreciated...

Related

oozie shell action with spark-submit

i am trying to run spark-submit from a shell wrapper. while the job runs fine from command line but failed when scheduling through oozie.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments.handle(SparkSubmitArguments.scala:394)
at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:163)
at org.apache.spark.deploy.SparkSubmitArguments.(SparkSubmitArguments.scala:97)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
here is my workflow:
<workflow-app name="OozieTest1" xmlns="uri:oozie:workflow:0.5">
<start to="CopyTest"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="CopyTest">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>lib/copy.sh</exec>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/file-list/xxx_xxx_201610.lst</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/sample</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/output</argument>
<argument>IMMUN</argument>
<argument>N</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/resources/script-constants.properties</argument>
<file>hdfs://xxxxxx/user/xxxxxx/oozie-test/lib/copy.sh#copy.sh</file>
<file>hdfs://xxxxxx/user/xxxxxx/oozie-test/lib/xxxx_Integration.jar#xxxx_Integration.jar</file>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
It depends what version of spark, hadoop and oozie you use. But most probably you have some dependency issues. (jar is missing) I would really recommend to check your dependencies. Here you can find the full working example here:
In this example the hadoop and spark versions are following:
<hadoop.version>2.6.0-cdh5.4.7</hadoop.version>
<spark.version>1.3.0-cdh5.4.7</spark.version>

Scheduling a sqoop job in oozie through Shell script using Hue

I am able to run a sqoop command in Oozie using Hue. But, when I try to run the same sqoop command by placing it in a shell script I am getting an error like below
Stdoutput 2016-05-20 10:52:13,241 ERROR [main] sqoop.Sqoop (Sqoop.java:runSqoop(181)) - Got exception running Sqoop:
java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
I have included the jdbc jar file like I did while running the sqoop command directly. I don't understand why it is not working for shell script.
Here is the workflow generated by Hue
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="shell-ca31"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-ca31">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
<property>
<name>oozie.libpath</name>
<value>/user/oozie/libext</value>
</property>
</configuration>
<exec>sqoopoozie.sh</exec>
<file>/user/yxr6907/sqoopoozie.sh#sqoopoozie.sh</file>
<archive>/user/oozie/libext/ojdbc7.jar#ojdbc7.jar</archive>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
When you use shell action, jars for sqoop are not imported into classpath.
I was able to solve it by adding the jar into the classpath. Then, i export HADOOP_CLASSPATH and sqoop works.
Use the following:
Put the jar ojdbc7.jar in files
Use the following command inside shell script: export HADOOP_CLASSPATH=${PWD}/ojdbc7.jar
Instead of step 1. you can use the following properties to load jar into classpath:
oozie.use.system.libpath=true
oozie.libpath=/path/to/jars
Exporting HADOOP_CLASSPATH is required in both ways.

oozie Pig action lauching Error

I am trying to do very basic oozie workflow
I am getting the below error wheni give the command..
user#ubuntu:~/surender$ oozie job -oozie http://localhost:11000/oozie /home/user/surender/oozie_demo/job.properties -run
Error:
Error: E0501 : E0501: Could not perform authorization operation, Failed on local exception: java.io.EOFException; Host Details : local host is: "ubuntu/127.0.0.1"; destination host is: "localhost":8020;
My oozie version is 4.0.0 , I checked that oozie web console is enabled..
This is how created a oozie workflow
I created a directory called oozie_demo and inside that i created two files
1.workflow.xml
2.job.properties
I also created a lib directory and placed the pig script inside that lib directory
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
<start to="pig-node"/>
<action name="pig-node">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/output/pig/simple_load"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
</configuration>
<script>simple_load.pig</script>
<param>INPUT=/user/${wf:user()}/inputfiles/records.txt</param>
<param>OUTPUT=/user/${wf:user()}//output/pig/simple_load</param>
</pig>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Pig failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
job.properties
nameNode=hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
oozie_demo=oozie_demo
oozie.use.system.libpath=true
ozie.wf.application.path=${nameNode}/user/user/oozie_demo
my pig script :
records = load '/user/user/inputfiles/records.txt' USING PigStorage(',');
store records into '/user/user/output/pig/simple_load' using PigStorage(',');
Could somebody help me on this? I would like to know what went wrong? and how do i resolve this issue ?
Could you check if the Namenode is up and running at port 8020.

Oozie shell action not running as submitting user

I've written an Oozie workflow that runs a BASH shell script to do some hive queries and perform some actions on the results. The script runs but throws a permission error when accessing some of the HDFS data. The user that submitted the Oozie workflow has permission but the script is running as the yarn user.
Is it possible to make Oozie execute the script as the user who submitted the workflow? Hive and Java actions both execute as the submitted user, just shell is behaving differently.
Here's the rough outline of my Oozie action
<action name="start_action"
retry-max="12"
retry-interval="600">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${WorkflowRoot}/hive-site.xml</job-xml>
<exec>script.sh</exec>
<file>${WorkflowRoot}/script.sh</file>
<capture-output />
</shell>
<ok to="next_action"/>
<error to="send_email"/>
</action>
I'm running Oozie 4.1.0 and HDP 2.1.
This issue will occur in all cluster that are configured using Simple Security. You've an option to override the default configuration. Include the below statement at the starting of the shell script will fix this issue.
export HADOOP_USER_NAME=<Name of submitted user>;
you can make run with help of env-var
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<workflow-app xmlns="uri:oozie:workflow:0.3" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>test.sh</exec>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>/user/root/test.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

Python Oozie Shell Action Failing

I have an Oozie workflow that contains a shell action that invokes a Python script that is failing with the following error.
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
The Python script (hello.py) is simple.
print("Hello, World!")
Here is my Oozie workflow.
<workflow-app xmlns="uri:oozie:workflow:0.4" name="hello">
<start to="shell-check-hour"/>
<action name="shell-check-hour">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>hello.py</exec>
<file>hdfs://localhost:8020/user/test/hello.py</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Can someone see anything wrong with what I am doing? If I replace the Python script with a shell script, the shell script executes fine (both files are in the same directory). This leads me to believe that the problem is that for whatever reason, Python isn't being recognised by Oozie.
Add Hash-Bang to your script
For example, my script started with
#!/usr/bin/env python

Resources