Oozie - Hadoop commands are not executing (Shell) - shell

I am running a shell script that has hadoop commands.
Getting the following error when executing the same
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
I am running a simple shell script with Cloudera Hue - Oozie
However when the script has no hadoop commands, it gets executed sucessfully.
I have set oozie.use.system.libpath=true and could see my libs are in
user/oozie/share/lib/<lib_timestmap>
Below is the shell script I am trying to run
#! /bin/bash
$(hadoop fs -mkdir /<location path>)
Wokflow.xml
<workflow-app name="Shell-copy" xmlns="uri:oozie:workflow:0.4">
<start to="Shell-copy"/>
<action name="Shell-copy">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>test.sh</exec>
<file>/user/hue/oozie/workspaces/_rrv9kor_-oozie-38-1455857816.12/test.sh#test.sh</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
The scripts with hadoop commands either throwing the above error or displayed as completed successfully with none of the hadoop commands processed and only the echo's or other shell commands are executed.

The problem with the shell action is that shell jobs are deployed as the 'mapred' user. As the above is deploying the oozie job from a user account other than mapred user , Permission Denied error was thrown.The way to solve this is to set the HADOOP_USER_NAME environment variable to the user account name through which you deploy your oozie workflow.
<env-var>HADOOP_USER_NAME=user_name_goes_here</env-var>

Related

How to mark an Oozie workflow action's status as OK

I am using Apache oozie. I want to mark the status of one of the shell action as OK, in my oozie workflow. It is in Running state.
Can we please share the command to use in Apache Oozie to do this.
You don't need to explicitly set the status of an Action. Oozie automatically does that for you based on the action/task execution. For instance, let's say you have shell action that looks something like this:
<workflow-app
xmlns="uri:oozie:workflow:0.3" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell
xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>some-script.sh</exec>
<file>/user/src/some-script.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
If the /user/src/some-script.sh execution is successful, Oozie will mark the action status as ok and successfully ends the job. On the other hand if the script execution encounters any error, it will be marked as error immediately kill the job and directed. If you're looking for not to kill the job due to any abnormal execution of the code in your script, you can create another action and direct Oozie to follow that execution path instead of immediately killing the workflow. Checkout more about Oozie Shell Action.

Move file from HDFS one directory to other directory in HDFS using OOZIE?

I am trying to copy a file from HDFS one directory to other directory in HDFS, with the help of shell script as a part of oozie Job, but i am not able to copy it through oozie.
Can we copy file from HDFS one directory to other director in HDFS using oozie.
when i am running the oozie job, i am not any getting error.
it is showing status SUCCEEDED but file is not copying to destination directory.
oozie Files are below.
test.sh
#!/bin/bash
echo "listing files in the current directory, $PWD"
sudo hadoop fs -cp /user/cloudera/RAVIOOZIE/input/* /user/cloudera/RAVIOOZIE/output/
ls # list files
my workflow.xml is
<workflow-app name="RAMA" xmlns="uri:oozie:workflow:0.5">
<start to="shell-381c"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-381c">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>/user/cloudera/test.sh</exec>
<file>/user/cloudera/test.sh#test.sh</file>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
and my job.properties
oozie.use.system.libpath=True
security_enabled=False
dryrun=True
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
oozie.wf.application.path=${nameNode}/user/cloudera/test/
please help on this. why file is not copying to my destination director.
please let me know is there any thing i missed.
As mentioned in the comments by #Samson:
If you want to do hadoop actions with oozie, you should use a hdfs action rather than a shell action for that.
I am not sure why you don't get an error, but here is some speculation on what might happen:
You give oozie the task of starting a shell action, it succesfully starts the shell action and reports a success. Then the shell action fails, but that's not oozies problem.

How do I set path in oozie workflows?

I am trying to run a Shell Script on Oozie.First, I selected the path of shell script file, after which I added the arguments to run the shell script file. When I try running the oozie worflow, as such, it goes into a running loop which gets killed after 10 seconds.
I also added the Environment Variable by setting the path of the output folder in HDFS. When I run it, again it runs into a loop which gets killed after 10 seconds. I am unable to figure out how to set the path. Please help.
Your question is not clear,
But I gess , your are trying to run Shell Script using Oozie workflow, where Shell Script Arguments will be pass from Oozie it self.
If My understanding is right, you can pass the Argument variable from Oozie via coordinator.properties/coordinator.xml/workflow.xml it self.
Example:
let say You have a shell script, which will perform distcp everytime its execute to anothere dfs location.
Shell Script:
> hadoop dfs -rmr destination_location
> hadoop distcp hdfs://<source_dfs><source_dfs_location> hdfs://<destination_dfs><destination_dfs_location>
workflow.xml:
<action name="shellAction">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>shell_script.sh</exec>
<file>hdfs://<dfs:port>/<dfs_location/shell_script.sh></file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="killAction"/>
</action>
<kill name="killAction">
<message>Shell Action Failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
Note: Shell action chould be defined in oozie_site.xml
Belive these will help u some point

Oozie shell action not running as submitting user

I've written an Oozie workflow that runs a BASH shell script to do some hive queries and perform some actions on the results. The script runs but throws a permission error when accessing some of the HDFS data. The user that submitted the Oozie workflow has permission but the script is running as the yarn user.
Is it possible to make Oozie execute the script as the user who submitted the workflow? Hive and Java actions both execute as the submitted user, just shell is behaving differently.
Here's the rough outline of my Oozie action
<action name="start_action"
retry-max="12"
retry-interval="600">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${WorkflowRoot}/hive-site.xml</job-xml>
<exec>script.sh</exec>
<file>${WorkflowRoot}/script.sh</file>
<capture-output />
</shell>
<ok to="next_action"/>
<error to="send_email"/>
</action>
I'm running Oozie 4.1.0 and HDP 2.1.
This issue will occur in all cluster that are configured using Simple Security. You've an option to override the default configuration. Include the below statement at the starting of the shell script will fix this issue.
export HADOOP_USER_NAME=<Name of submitted user>;
you can make run with help of env-var
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<workflow-app xmlns="uri:oozie:workflow:0.3" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>test.sh</exec>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>/user/root/test.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>

Python Oozie Shell Action Failing

I have an Oozie workflow that contains a shell action that invokes a Python script that is failing with the following error.
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
The Python script (hello.py) is simple.
print("Hello, World!")
Here is my Oozie workflow.
<workflow-app xmlns="uri:oozie:workflow:0.4" name="hello">
<start to="shell-check-hour"/>
<action name="shell-check-hour">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>hello.py</exec>
<file>hdfs://localhost:8020/user/test/hello.py</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Can someone see anything wrong with what I am doing? If I replace the Python script with a shell script, the shell script executes fine (both files are in the same directory). This leads me to believe that the problem is that for whatever reason, Python isn't being recognised by Oozie.
Add Hash-Bang to your script
For example, my script started with
#!/usr/bin/env python

Resources