How do I set path in oozie workflows? - shell

I am trying to run a Shell Script on Oozie.First, I selected the path of shell script file, after which I added the arguments to run the shell script file. When I try running the oozie worflow, as such, it goes into a running loop which gets killed after 10 seconds.
I also added the Environment Variable by setting the path of the output folder in HDFS. When I run it, again it runs into a loop which gets killed after 10 seconds. I am unable to figure out how to set the path. Please help.

Your question is not clear,
But I gess , your are trying to run Shell Script using Oozie workflow, where Shell Script Arguments will be pass from Oozie it self.
If My understanding is right, you can pass the Argument variable from Oozie via coordinator.properties/coordinator.xml/workflow.xml it self.
Example:
let say You have a shell script, which will perform distcp everytime its execute to anothere dfs location.
Shell Script:
> hadoop dfs -rmr destination_location
> hadoop distcp hdfs://<source_dfs><source_dfs_location> hdfs://<destination_dfs><destination_dfs_location>
workflow.xml:
<action name="shellAction">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>shell_script.sh</exec>
<file>hdfs://<dfs:port>/<dfs_location/shell_script.sh></file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="killAction"/>
</action>
<kill name="killAction">
<message>Shell Action Failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
Note: Shell action chould be defined in oozie_site.xml
Belive these will help u some point

Related

What permission does a Presto CLI command inside a shell script require in order to be able to execute

I have a oozie workflow which calls a shell script. The shell script calls a presto cli query. If I execute the shell script from the putty terminal then everything works as expected
But when I run the Workflow an error is raised. I know the shell script itself is being called successfully as I can see my ECHO statement in the log fileas below
This script is stored inside the Hadoop File system
shell script :
#!/bin/bash
echo Execute Presto Command
/home/hdfs/presto-cli --server 1.1.1.1:8080 --catalog hive --schema test --execute 'SELECT * from test limit 10'
However when the presto-cli command is called a "Permission Denied" error is raised
I can see in the Oozie GUI for the action that is running as my HDFS user so I am not sure what permission it refers to?
I read somewhere that you need to tell the workflow to run as the user who launched the workflow and not the YARN user. I added the env-var tag to my workflow action (see below) but this didnt help either
My Workflow's action is below :
<action name="Shell-Action">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>_myshell.sh</exec>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>/user/hdfs/scripts/myshell.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
What am I doing wrong?
Thanks

Oozie supress logging from shell job action?

I have a simple workflow (see below) which runs a shell script. The shell script runs pyspark script, which moves file from local to hdfs folder.
When I run the shell script itself, it works perfectly, logs are redirect to a folder by > spark.txt 2>&1 right in the shell script.
But when I submit oozie job with following workflow, output from shell seems to be supressed. I tried to redirect all possible oozie logs (-verbose -log) > oozie.txt 2>&1, but it didn't help.
The workflow is finished successfuly (status SUCCESSEDED, no error log), but I see, the folder is not copied to hdfs, however when I run it alone (not through oozie), everything is fine.
<action name="forceLoadFromLocal2hdfs">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>driver-script.sh</exec>
<argument>s</argument>
<argument>script.py</argument>
<!-- arguments for py script -->
<argument>hdfsPath</argument>
<argument>localPath</argument>
<file>driver-script.sh#driver-script.sh</file>
</shell>
<ok to="end"/>
<error to="killAction"/>
Thx a lot!
EDIT: Thx to the advice I found full log under the
yarn -logs -applicationId [application_xxxxxx_xxxx]
Thx to the advice I found full log under the
yarn -logs -applicationId [application_xxxxxx_xxxx]

Move file from HDFS one directory to other directory in HDFS using OOZIE?

I am trying to copy a file from HDFS one directory to other directory in HDFS, with the help of shell script as a part of oozie Job, but i am not able to copy it through oozie.
Can we copy file from HDFS one directory to other director in HDFS using oozie.
when i am running the oozie job, i am not any getting error.
it is showing status SUCCEEDED but file is not copying to destination directory.
oozie Files are below.
test.sh
#!/bin/bash
echo "listing files in the current directory, $PWD"
sudo hadoop fs -cp /user/cloudera/RAVIOOZIE/input/* /user/cloudera/RAVIOOZIE/output/
ls # list files
my workflow.xml is
<workflow-app name="RAMA" xmlns="uri:oozie:workflow:0.5">
<start to="shell-381c"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-381c">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>/user/cloudera/test.sh</exec>
<file>/user/cloudera/test.sh#test.sh</file>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
and my job.properties
oozie.use.system.libpath=True
security_enabled=False
dryrun=True
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
oozie.wf.application.path=${nameNode}/user/cloudera/test/
please help on this. why file is not copying to my destination director.
please let me know is there any thing i missed.
As mentioned in the comments by #Samson:
If you want to do hadoop actions with oozie, you should use a hdfs action rather than a shell action for that.
I am not sure why you don't get an error, but here is some speculation on what might happen:
You give oozie the task of starting a shell action, it succesfully starts the shell action and reports a success. Then the shell action fails, but that's not oozies problem.

Oozie - Hadoop commands are not executing (Shell)

I am running a shell script that has hadoop commands.
Getting the following error when executing the same
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
I am running a simple shell script with Cloudera Hue - Oozie
However when the script has no hadoop commands, it gets executed sucessfully.
I have set oozie.use.system.libpath=true and could see my libs are in
user/oozie/share/lib/<lib_timestmap>
Below is the shell script I am trying to run
#! /bin/bash
$(hadoop fs -mkdir /<location path>)
Wokflow.xml
<workflow-app name="Shell-copy" xmlns="uri:oozie:workflow:0.4">
<start to="Shell-copy"/>
<action name="Shell-copy">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>test.sh</exec>
<file>/user/hue/oozie/workspaces/_rrv9kor_-oozie-38-1455857816.12/test.sh#test.sh</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
The scripts with hadoop commands either throwing the above error or displayed as completed successfully with none of the hadoop commands processed and only the echo's or other shell commands are executed.
The problem with the shell action is that shell jobs are deployed as the 'mapred' user. As the above is deploying the oozie job from a user account other than mapred user , Permission Denied error was thrown.The way to solve this is to set the HADOOP_USER_NAME environment variable to the user account name through which you deploy your oozie workflow.
<env-var>HADOOP_USER_NAME=user_name_goes_here</env-var>

How to get oozie jobId in oozie workflow?

I have a oozie workflow that will invoke a shell file, Shell file will further invoke a driver class of mapreduce job. Now i want to map my oozie jobId to Mapreduce jobId for later process. Is there any way to get oozie jobId in workflow file so that i can pass the same as argument to my driver class for mapping.
Following is my sample workflow.xml file
<workflow-app xmlns="uri:oozie:workflow:0.4" name="test">
<start to="start-test" />
<action name='start-test'>
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${jobScript}</exec>
<argument>${fileLocation}</argument>
<argument>${nameNode}</argument>
<argument>${jobId}</argument> <!-- this is how i wanted to pass oozie jobId -->
<file>${jobScriptWithPath}#${jobScript}</file>
</shell>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>test job failed
failed:[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
Following is my shell script.
hadoop jar testProject.jar testProject.MrDriver $1 $2 $3
Try to use ${wf:id()}:
String wf:id()
It returns the workflow job ID for the current workflow job.
More info here.
Oozie drops an XML file in the CWD of the YARN container running the shell (the "launcher" container), and also sets an env variable pointing to that XML (cannot remember the name though).
That XML contains a lot of stuff like name of Workflow, name of Action, ID of both, run attempt number, etc.
So you can sed back that information in the shell script itself.
Of course passing explicitly the ID (as suggested by Alexei) would be cleaner, but sometimes "clean" is not the best way. Especially if you are concerned about whether it's the first run or not...

Resources