I have a oozie job as follows:
<action name="action1">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${sshHost}</host>
<command>${scriptDir}/${scriptName}</command>
<capture-output/>
</ssh>
<ok to="scheduleOkAction" />
<error to="scheduleErrorAction" />
</action>
<action name="action2">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Workflow failed to run</subject>
<body>
[Workflow - ${wf:name()}] failed to run.
Stack Trace
${wf:actionData('action1')['ERROR']}
</body>
</email>
<ok to="kill" />
<error to="kill" />
</action>
<action name="action3">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Success</subject>
<body>
[Workflow - ${wf:name()}] - OK.
${wf:actionData('action1')['SUCCESS_CODE']}
</body>
</email>
<ok to="kill" />
<error to="kill" />
</action>
When 'action1' run with out an exception, the workflow takes me to 'action3' and sends me an email with the 'SUCCESS_CODE' actionData. But when 'action1' throws an exception, the workflow takes me to 'action2' and sends me an email with out 'ERROR' actionData.
Is it a valid behavior to loose all actionData in an event of error in Oozie? Or am I missing something?
Related
I am getting below mentioned error message while reading config properties in separate sub-workflow file. I am posting the sample code. Appreciate your help in resolving this issue. Thank you!
2019-01-17 08:44:52,885 WARN ActionStartXCommand:523 - SERVER[localhost] USER[user1] GROUP[-] TOKEN[] APP[subWorkflow] JOB[0338958-190114130857167-oozie-oozi-W] ACTION[0338958-190114130857167-oozie-oozi-W#subWorkflowAction1] ELException in ActionStartXCommand
javax.servlet.jsp.el.ELException: variable [jobtracker] cannot be resolved
Coordinator job trigger command
oozie job --oozie http://localhost:11000/oozie --config /home/user/oozie-scripts/props/job.properties -run
job.properties
namenode=hdfs://localhost
workflowpath=${namenode}/user/user1/oozie-workflow/parentWorkflow.xml
frequency=25
starttime=2018-08-06T13\:29Z
endtime=2108-08-06T13\:29Z
timezone=UTC
oozie.coord.application.path=${namenode}/user/user1/oozie-workflow/coordinator.xml
jobtracker=http://localhost:8088
scriptpath=/user/user1/oozie-workflow
Coordinator
<coordinator-app name="sampleCoord" frequency="${frequency}" start="${starttime}" end="${endtime}" timezone="${timezone}" xmlns="uri:oozie:coordinator:0.4">
<action>
<workflow>
<app-path>${workflowpath}</app-path>
</workflow>
</action>
</coordinator-app>
Parent Workflow
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "Parent-Workflow">
<start to = "workflowAction1" />
<action name = "workflowAction1">
<sub-workflow>
<app-path>/user/user1/oozie-workflow/subWorkflow1.xml</app-path>
</sub-workflow>
<ok to = "end" />
<error to = "end" />
</action>
Sub-Workflow
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "subWorkflow">
<start to = "subWorkflowAction1" />
<action name = "subWorkflowAction1">
<hive xmlns = "uri:oozie:hive-action:0.4">
<job-tracker>${jobtracker}</job-tracker>
<script>${scriptpath}/dropTempTable.hive</script>
<param>Temp_TableVar=${concat(concat("HBASE_",replaceAll(wf:id(),"- ","_")),"_TEMP")}</param>
</hive>
<ok to = "end" />
<error to = "kill_job" />
</action>
<kill name = "kill_job">
<message>Job failed</message>
</kill>
<end name = "end" />
</workflow-app>
Adding propagate-configuration tag in parent workflow xml file resolved the issue.
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "Parent-Workflow">
<start to = "workflowAction1" />
<action name = "workflowAction1">
<sub-workflow>
<app-path>/user/user1/oozie-workflow/subWorkflow1.xml</app-path>
<propagate-configuration />
</sub-workflow>
<ok to = "end" />
<error to = "end" />
Currently I have 6 actions in my oozie workflow as shown below.
After MainJob1 completes all the first, second and third jobs should run in parallel.
After MainJob2 completes only second and third jobs should run in parallel.
Is there any possibility to solve the above way of workflow executions?
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
....
<decision name="execution-mode-decision">
<switch>
<case to="MainJob1">${executionMode eq "DEFAULT"}</case>
<case to="MainJob2">${executionMode eq "INVALID"}</case>
<default to="MainJob1" />
</switch>
</decision>
<action name="MainJob1">
<map-reduce>
.......
</map-reduce>
<ok to="fork1"/>
<error to="kill"/>
</action>
<action name="MainJob2">
<map-reduce>
......
</map-reduce>
<ok to="fork2"/>
<error to="kill"/>
</action>
...
<fork name="fork1">
<path start="firstparalleljob"/>
<path start="secondparalleljob"/>
<path start="thirdparalleljob"/>
</fork>
<fork name="fork2">
<path start="secondparalleljob"/>
<path start="thirdparalleljob"/>
</fork>
<action name="firstparallejob">
<map-reduce>
...........
<ok to="joining"/>
<error to="kill"/>
</action>
<action name="secondparalleljob">
<map-reduce>
........
</map-reduce>
<ok to="joining"/>
<error to="kill"/>
</action>
<action name="thirdparalleljob">
<map-reduce>
........
</map-reduce>
<ok to="joining"/>
<error to="kill"/>
</action>
<join name="joining" to="emailFailure"/>
...
</workflow-app>
You can put firstparalleljob, secondparalleljob and thirdparalleljob in separate 3 sub-workflows, then call 3 sub workflows in the first fork and 2 sub-workflow in next fork. In this way, we can even pass a different value to a variable at different fork time in the same action.
My actions
start_fair_usage ends with status okey, but test_copy returns
Main class [org.apache.oozie.action.hadoop.DistcpMain], main() threw exception, null
In /user/comverse/data/${1}_B I have a lot of different files, some of which I want to copy to ${NAME_NODE}/user/evkuzmin/output. For that I try to pass paths from copy_files.sh which holds an array of paths to the files I need.
<action name="start_fair_usage">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<exec>${copy_file}</exec>
<argument>${today_without_dash}</argument>
<argument>${mta}</argument>
<!-- <file>${path}#${start_fair_usage}</file> -->
<file>${path}${copy_file}#${copy_file}</file>
<capture-output/>
</shell>
<ok to="test_copy"/>
<error to="KILL"/>
</action>
<action name="test_copy">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<job-tracker>${JOB_TRACKER}</job-tracker>
<name-node>${NAME_NODE}</name-node>
<arg>${wf:actionData('start_fair_usage')['paths']}</arg>
<!-- <arg>${NAME_NODE}/user/evkuzmin/input/*</arg> -->
<arg>${NAME_NODE}/user/evkuzmin/output</arg>
</distcp>
<ok to="END"/>
<error to="KILL"/>
</action>
start_fair_usage starts copy_file.sh
echo ${1}
echo ${2}
dirs=(
/user/comverse/data/${1}_B
)
args=()
for i in $(hadoop fs -ls "${dirs[#]}" | egrep ${2}.gz | awk -F " " '{print $8}')
do
args+=("$i")
echo "copy file - "${i}
done
paths=${args}
echo ${paths}
Here is what I did in the end.
<start to="start_copy"/>
<fork name="start_copy">
<path start="copy_mta"/>
<path start="copy_rcr"/>
<path start="copy_sub"/>
</fork>
<action name="copy_mta">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<prepare>
<delete path="${NAME_NODE}${dstFolder}mta/*"/>
</prepare>
<arg>${NAME_NODE}${srcFolder}/*mta.gz</arg>
<arg>${NAME_NODE}${dstFolder}mta/</arg>
</distcp>
<ok to="end_copy"/>
<error to="KILL"/>
</action>
<action name="copy_rcr">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<prepare>
<delete path="${NAME_NODE}${dstFolder}rcr/*"/>
</prepare>
<arg>${NAME_NODE}${srcFolder}/*rcr.gz</arg>
<arg>${NAME_NODE}${dstFolder}rcr/</arg>
</distcp>
<ok to="end_copy"/>
<error to="KILL"/>
</action>
<action name="copy_sub">
<distcp xmlns="uri:oozie:distcp-action:0.2">
<prepare>
<delete path="${NAME_NODE}${dstFolder}sub/*"/>
</prepare>
<arg>${NAME_NODE}${srcFolder}/*sub.gz</arg>
<arg>${NAME_NODE}${dstFolder}sub/</arg>
</distcp>
<ok to="end_copy"/>
<error to="KILL"/>
</action>
<join name="end_copy" to="END"/>
<kill name="KILL">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="END"/>
Turned out it was possible to use wildcards in distcp, so I didn't need bash at all.
Also. Some people adviced me to write it in scala.
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path, FileUtil}
val conf = new Configuration()
val fs = FileSystem.get(conf)
val listOfFileTypes = List("mta", "rcr", "sub")
val listOfPlatforms = List("B", "C", "H", "M", "Y")
for(fileType <- listOfFileTypes){
FileUtil.fullyDeleteContents(new File("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + fileType))
for (platform <- listOfPlatforms) {
var srcPaths = fs.globStatus(new Path("/user/comverse/data/" + "20170404" + "_" + platform + "/*" + fileType + ".gz"))
var dstPath = new Path("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + fileType)
for(srcPath <- srcPaths){
println("copying " + srcPath.getPath.toString)
FileUtil.copy(fs, srcPath.getPath, fs, dstPath, false, conf)
}
}
}
Both things work, thought I haven't tried to run the scala script in Oozie.
i want check whether the file is exist or not, In HDFS location using oozie batch.
in my HDFS location , in daily base I will get file like "test_08_01_2016.csv","test_08_02_2016.csv" at every day 11PM.
So i want check whether the file exist are after 11.15 PM ,i can check file exist on not using decision node. by using below workflow .
<workflow-app name="HIVECoWorkflow" xmlns="uri:oozie:workflow:0.5">
<start to="CheckFile"/>
<decision name="CheckFile">
<switch>
<case to="nextOozieTask">
${fs:exists("/user/cloudera/file/input/test_08_01_2016.csv")}
</case>
<default to="MailActionFileMissing" />
</switch>
<action name="MailActionFileMissing" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://quickstart.cloudera:10000/default</jdbc-url>
<script>/user/cloudera/email/select.hql</script>
<file>/user/cloudera/hive-site.xml</file>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action>
<action name="nextOozieTask" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://quickstart.cloudera:10000/default</jdbc-url>
<script>/user/cloudera/email/select1.hql</script>
<file>/user/cloudera/hive-site.xml</file>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="End"/>
but i want to get file name dynamically like for example
" filenamt_todaysdate i.e test_08_01_2016.csv".
please help me on this how can i get filename dynamical.
thanks in advance.
The solution for the above question is, we have to get the date value from coordination job like below code ,inside the coordination job.
<property>
<name>today</name>
<value>${coord:formatTime(coord:dateTzOffset(coord:nominalTime(), "America/Los_Angeles"), 'yyyyMMdd')}</value>
</property>
We can check the file exist or not in given HDFS location with the help fs:exists i.e
${fs:exists(concat(concat(nameNode, path),today))}
And in workflow we have to pass the parameter of the coordination job date value “today” like below code
<workflow-app name="HIVECoWorkflow" xmlns="uri:oozie:workflow:0.5">
<start to="CheckFile"/>
<decision name="CheckFile">
<switch>
<case to="nextOozieTask">
${fs:exists(concat(concat(nameNode, path),today))}
</case>
<case to="nextOozieTask1">
${fs:exists(concat(concat(nameNode, path),yesterday))}
</case>
<default to="MailActionFileMissing" />
</switch> </decision>
<action name="MailActionFileMissing" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://quickstart.cloudera:10000/default</jdbc-url>
<script>/user/cloudera/email/select.hql</script>
<file>/user/cloudera/hive-site.xml</file>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action>
<action name="nextOozieTask" cred="hive2">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://quickstart.cloudera:10000/default</jdbc-url>
<script>/user/cloudera/email/select1.hql</script>
<file>/user/cloudera/hive-site.xml</file>
</hive2>
<ok to="End"/>
<error to="Kill"/>
</action><kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="End"/>
in job.properties we can declare all static values like below.
jobStart=2016-08-23T09:50Z
jobEnd=2016-08-23T10:26Z
tzOffset=-8
initialDataset=2016-08-23T09:50Z
oozie.use.system.libpath=True
security_enabled=False
dryrun=True
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
test=${nameNode}/user/cloudera/email1
oozie.coord.application.path=${nameNode}/user/cloudera/email1/add-partition-coord-app.xml
path=/user/cloudera/file/input/ravi_
May be you can write a shell script which does the hdfs file exists check. Upon success return 0 else 1. Based on this rewrite oozie workflow success and error nodes...
I have to run a set of parallel jobs in oozie which i am able to run using the fork option in oozie.
Now the problem which i am facing is if one job fails the rest of the jobs also fail because i am calling the kill control node on error for every single job.
I ve searched on the web a lot but i couldnt find how to handle the error clean up separately for every single job.
Any help would be appreciated.
My workflow.xml is as follows:
<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
<start to="copyfroms3tohdfs"/>
<action name="copyfroms3tohdfs">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${s3tohdfsscript}</command>
<capture-output/>
</ssh>
<ok to="createhivetables"/>
<error to="killAction"/>
</action>
<action name="createhivetables">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${createhivetablesscript}</command>
<capture-output/>
</ssh>
<ok to="gold__pos_denorm_trn_itm_offr"/>
<error to="killAction"/>
</action>
<action name="gold__pos_denorm_trn_itm_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${denormalizationscript}</command>
<capture-output/>
</ssh>
<ok to="forknode"/>
<error to="killAction"/>
</action>
<fork name="forknode">
<path start="gold__dypt_pos_trn_offr"/>
<path start="gold__hr_pos_trn_offr"/>
<path start="approach3"/>
<path start="aproach11"/>
<path start="aproach12"/>
<path start="aproach13"/>
<path start="aproach14"/>
<path start="aproach15"/>
<path start="aproach16"/>
<path start="aproach17"/>
</fork>
<action name="gold__dypt_pos_trn_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${daypartscript}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="gold__hr_pos_trn_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${hourscript}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="approach3">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach3script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach11">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach11script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach12">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach12script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach13">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach13script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach14">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach14script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach15">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach15script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach16">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach16script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach17">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach17script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<join name="joinnode" to="end"/>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
Create a new node(mostly java) which will perform the clean up activities for you. Also route all the "error to" actions to this new node. You will be able to identify the node which actually caused the error using EL function - ${wf:lastErrorNode()}. Pass this as an argument to cleanup handling node, so that inside java you can do whatever logic you wish to have for clean up(use java hdfs API).
The new node would be like:
<action name="myCleanUpAction">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>com.foo.CleanUpMain</main-class>
<arg>${wf:lastErrorNode()}</arg>
<arg>any useful argument1</arg>
<arg>any useful argument2</arg>
</java>
<ok to="fail"/>
<error to="fail"/>
</action>