Oozie shell action not running as submitting user

Oozie shell action not running as submitting user - shell

I've written an Oozie workflow that runs a BASH shell script to do some hive queries and perform some actions on the results. The script runs but throws a permission error when accessing some of the HDFS data. The user that submitted the Oozie workflow has permission but the script is running as the yarn user.
Is it possible to make Oozie execute the script as the user who submitted the workflow? Hive and Java actions both execute as the submitted user, just shell is behaving differently.
Here's the rough outline of my Oozie action
<action name="start_action"
retry-max="12"
retry-interval="600">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${WorkflowRoot}/hive-site.xml</job-xml>
<exec>script.sh</exec>
<file>${WorkflowRoot}/script.sh</file>
<capture-output />
</shell>
<ok to="next_action"/>
<error to="send_email"/>
</action>
I'm running Oozie 4.1.0 and HDP 2.1.

This issue will occur in all cluster that are configured using Simple Security. You've an option to override the default configuration. Include the below statement at the starting of the shell script will fix this issue.
export HADOOP_USER_NAME=<Name of submitted user>;

you can make run with help of env-var
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<workflow-app xmlns="uri:oozie:workflow:0.3" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>test.sh</exec>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>/user/root/test.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

Related

How to mark an Oozie workflow action's status as OK

I am using Apache oozie. I want to mark the status of one of the shell action as OK, in my oozie workflow. It is in Running state.
Can we please share the command to use in Apache Oozie to do this.

You don't need to explicitly set the status of an Action. Oozie automatically does that for you based on the action/task execution. For instance, let's say you have shell action that looks something like this:
<workflow-app
xmlns="uri:oozie:workflow:0.3" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell
xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>some-script.sh</exec>
<file>/user/src/some-script.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
If the /user/src/some-script.sh execution is successful, Oozie will mark the action status as ok and successfully ends the job. On the other hand if the script execution encounters any error, it will be marked as error immediately kill the job and directed. If you're looking for not to kill the job due to any abnormal execution of the code in your script, you can create another action and direct Oozie to follow that execution path instead of immediately killing the workflow. Checkout more about Oozie Shell Action.

oozie over hive to fetch the data from table

I am trying to do automation through oozie over hive.I wrote simple hive query for creation of table and select queries on that table.When I submitted the same script.Script goes to running mode and doesn't execute.I checked the yarn application -list ,job was hanged on 95%.Hive table had been created successfully but not able to fetch data from table.Please let me know how to resolve this problem.
Thanks in Advance.
Workflow.xml
<action name="hive2-node">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive2"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<jdbc-url>${jdbcURL}</jdbc-url>
<script>script.q</script>
<param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param>
<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive2</param>
</hive2>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
script.q
job.properties
nameNode=hdfs://...:8020
jobTracker=...:8050
queueName=default
jdbcURL=jdbc:hive2://..*.:10000/default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/hive2

oozie shell action with spark-submit

i am trying to run spark-submit from a shell wrapper. while the job runs fine from command line but failed when scheduling through oozie.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments.handle(SparkSubmitArguments.scala:394)
at org.apache.spark.launcher.SparkSubmitOptionParser.parse(SparkSubmitOptionParser.java:163)
at org.apache.spark.deploy.SparkSubmitArguments.(SparkSubmitArguments.scala:97)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
here is my workflow:
<workflow-app name="OozieTest1" xmlns="uri:oozie:workflow:0.5">
<start to="CopyTest"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="CopyTest">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>lib/copy.sh</exec>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/file-list/xxx_xxx_201610.lst</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/sample</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/output</argument>
<argument>IMMUN</argument>
<argument>N</argument>
<argument>hdfs://xxxxxx/user/xxxxxx/oozie-test/resources/script-constants.properties</argument>
<file>hdfs://xxxxxx/user/xxxxxx/oozie-test/lib/copy.sh#copy.sh</file>
<file>hdfs://xxxxxx/user/xxxxxx/oozie-test/lib/xxxx_Integration.jar#xxxx_Integration.jar</file>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>

It depends what version of spark, hadoop and oozie you use. But most probably you have some dependency issues. (jar is missing) I would really recommend to check your dependencies. Here you can find the full working example here:
In this example the hadoop and spark versions are following:
<hadoop.version>2.6.0-cdh5.4.7</hadoop.version>
<spark.version>1.3.0-cdh5.4.7</spark.version>

Oozie Workflow with Archive Action

I would like to make an oozie workflow where the final step of success would be to "Archive" the results.
The command in the shell to do it is
hadoop archive -archiveName=XXX.har -p /some/random/parent directorToArhive pathToArchiveDestination
I have tried the following
<workflow-app name="HARD_CODED_ARCHIVE_TEST" xmlns="uri:oozie:workflow:0.4">
<start to="archive"/>
<action name="archive">
<archive archiveName="xxx.har" src="/root/src/dir" dest="/path/to/desired/archive/location"/>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
The Error I get is something like the following:
WARNING: Exception in Runloop of thread: main with message: E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'archive'. One of '{"uri:oozie:workflow:0.4":map-reduce, "uri:oozie:workflow:0.4":pig, "uri:oozie:workflow:0.4":sub-workflow, "uri:oozie:workflow:0.4":fs, "uri:oozie:workflow:0.4":java, WC[##other:"uri:oozie:workflow:0.4"]}' is expected.
So it is very clear that I can't do this. because the oozie workflow schema does not support the "archive" action.
I really don't want to run this via a cron as I would like to archive immediately after a workflow completes successfully how do I do this.

Try this:
<action name="archive">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>org.apache.hadoop.tools.HadoopArchives</main-class>
<arg>-archiveName</arg>
<arg>${YourArchiveName}.har</arg>
<arg>-p</arg>
<arg>${FilesParentDirectory}</arg>
<arg>${SrcDirectory}</arg>
<arg>${DestDirectory}</arg>
</java>
<ok to="end"/>
<error to="error"/>
</action>
All you need is the hadoop-archives.jar file in your workflow. Also don't forget to put the jar in your workflow directory and you should be good to go. Hope that helps!

Python Oozie Shell Action Failing

I have an Oozie workflow that contains a shell action that invokes a Python script that is failing with the following error.
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
The Python script (hello.py) is simple.
print("Hello, World!")
Here is my Oozie workflow.
<workflow-app xmlns="uri:oozie:workflow:0.4" name="hello">
<start to="shell-check-hour"/>
<action name="shell-check-hour">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>hello.py</exec>
<file>hdfs://localhost:8020/user/test/hello.py</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Can someone see anything wrong with what I am doing? If I replace the Python script with a shell script, the shell script executes fine (both files are in the same directory). This leads me to believe that the problem is that for whatever reason, Python isn't being recognised by Oozie.

Add Hash-Bang to your script
For example, my script started with
#!/usr/bin/env python

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Oozie shell action not running as submitting user - shell

This issue will occur in all cluster that are configured using Simple Security. You've an option to override the default configuration. Include the below statement at the starting of the shell script will fix this issue. export HADOOP_USER_NAME=<Name of submitted user>;

Related

How to mark an Oozie workflow action's status as OK

oozie over hive to fetch the data from table

oozie shell action with spark-submit

Oozie Workflow with Archive Action

Python Oozie Shell Action Failing

Categories

Resources