How to invoke an oozie workflow via shell script and block/wait till workflow completion

How to invoke an oozie workflow via shell script and block/wait till workflow completion - shell

I have created a workflow using Oozie that is comprised of multiple action nodes and have been successfully able to run those via coordinator.
I want to invoke the Oozie workflow via a wrapper shell script.
The wrapper script should invoke the Oozie command, wait till the oozie job completes (success or error) and return back the Oozie success status code (0) or the error code of the failed oozie action node (if any node of the oozie workflow has failed).
From what I have seen so far, I know that as soon as I invoke the oozie command to run a workflow, the command exits with the job id getting printed on linux console, while the oozie job keeps running asynchronously in the backend.
I want my wrapper script to block till the oozie coordinator job completes and return back the success/error code.
Can you please let me know how/if I can achieve this using any of the oozie features?
I am using Oozie version 3.3.2 and bash shell in Linux.
Note: In case anyone is curious about why I need such a feature - the requirement is that my wrapper shell script should know how long an oozie job has been runnig, when an oozie job has completed, and accordingly return back the exit code so that the parent process that is calling the wrapper script knows whether the job completed successfully or not, and if errored out, raise an alert/ticket for the support team.

You can do that by using the job id then start a loop and parsing the output of oozie info. Below is the shell code for same.
Start oozie job
oozie_job_id=$(oozie job -oozie http://<oozie-server>/oozie -config job.properties -run );
echo $oozie_job_id;
sleep 30;
Parse job id from output. Here job_id format is "job: jobid"
job_id=$(echo $oozie_job_id | sed -n 's/job: \(.*\)/\1/p');
echo $job_id;
check job status at regular interval, if its Running or not
while [ true ]
do
job_status=$(oozie job --oozie http://<oozie-server>/oozie -info $job_id | sed -n 's/Status\(.*\): \(.*\)/\2/p');
if [ "$job_status" != "RUNNING" ];
then
echo "Job is completed with status $job_status";
break;
fi
#this sleep depends on you job, please change the value accordingly
echo "sleeping for 5 minutes";
sleep 5m
done
This is basic way to do it, you can modify it as per you use case.

To upload workflow definition to HDFS use the following command :
hdfs dfs -copyFromLocal -f workflow.xml /user/hdfs/workflows/workflow.xml
To fire up Oozie job you need these two commands at the below
Please Notice that to write each on a single line.
JOB_ID=$(oozie job -oozie http://<oozie-server>/oozie -config job.properties
-submit)
oozie job -oozie http://<oozie-server>/oozie -start ${JOB_ID#*:}
-config job.properties
You need to parse result coming from below command when the returning result = 0 otherwise it's a failure. Simply loop with sleep X amount of time after each trial.
oozie job -oozie http://<oozie-server>/oozie -info ${JOB_ID#*:}
echo $? //shows whether command executed successfully or not

Related

Snakemake does not recognise job failure due to timeout with error code -11

Does anyone had a problem snakemake recognizing a timed-out job. I submit jobs to a cluster using qsub with a time-out set per rule:
snakemake --jobs 29 -k -p --latency-wait 60 --use-envmodules \
--cluster "qsub -l walltime={resources.walltime},nodes=1:ppn={threads},mem={resources.mem_mb}mb"
If a job fails within a script, the next one in line will be executed. When a job however hits the time-out defined in a rule, the next job in line is not executed, reducing the total number of jobs run in parallel on the cluster over time. A timed-out job raises according to the MOAB scheduler (PBS server) a -11 exit status. As far as I understood any non-zero exit status means failure - or does this only apply to positive integers?!
Thanks in advance for any hint:)

If you don't provide a --cluster-status script, snakemake internally checks job status by touching some hidden files in the submitted job script. When a job times out, snakemake (on the node) doesn't get a chance to report the failure to the main snakemake instance as qsub will kill it.
You can try a cluster profile or just grab a suitable cluster status file (be sure to chmod it as an exe and have qsub report a parsable job id).

Why oozie submits shell action to yarn?

I am recently learning oozie. I little curious about shell action. I am executing shell action which contains shell command like
hadoop jar <jarPath> <FQCN>
While running this action there are two yarn jobs running which are
one for hadoop job
one for shell action
I dont understand why shell action needs yarn for execution. I also tried email action. It executes without yarn resources.

To answer this question, the difference is between
running a shell script independently(.sh file or from CLI)
running a shell action as a part of an oozie workflow.(shell script in an oozie shell action)
The first case is very obvious.
In the second case, oozie launches the shell script via YARN(is the resource negotiator )to run your shell script on the cluster where oozie is installed and runs MR jobs internally to launch the shell action. So the shell script runs as a YARN application internally. The logs of the oozie workflow shows the way the shell action is launched in oozie.

How to get the ID of MapReduce job submitted by the `hadoop jar <example.jar> <main-class>` command?

I want to write a shell script which submits a MapReduce job by the command hadoop jar <example.jar> <main-class>, then how can I get the ID of the job submitted by that command in the shell script, right after that command was invoked?
I know that the command hadoop job -list can display all jobs' IDs, but in that case I can't tell which job is submitted by the shell script.

What are the different ways to check if the mapreduce program ran successfully

If we need to automate a mapreduce program or run from a script, what are the different ways to check if the mapreduce program ran successfully? One way is to find is if _SUCCESS file is created in the output directory. Does the command "hadoop jar program.jar hdfs:/input.txt hdfs:/output" return 0 or 1 based on success or failure ?

Just like any other command in Linux, you can check the exit status of a
hadoop jar command using the built in variable $?.
You can use:
echo $?
after executing the hadoop jar command to check its status.
The exit status value varies from 0 to 255. An exit status of zero implies that the command executed successfully while a non-zero value indicates that the command failed.
Edit: To see how to achieve automation or to run from a script, refer Hadoop job fails when invoked by cron.

Running script on my local computer when jobs submitted by qsub on a server finish

I am submitting jobs via qsub to a server, and then want to analyze the results on the local machine after jobs are finished. Though I can find a way to submit the analysis job on the server, but don't know how to run that script on my local machine.
jobID=$(qsub job.sh)
qsub -W depend=afterok:$jobID analyze.sh
But instead of the above, I want something like
if(qsub -W depend=afterok:$jobID) finished successfully
sh analyze.sh
else
some script
How can I accomplish the above task?
Thank you very much.

I've faced a similar issue and I'll try to sketch the solution that worked for me:
After submitting your actual job,
jobID=$(qsub job.sh)
I would create a loop in your script that checks if the job is still running using
qstat $jobID | grep $jobID | awk '{print $5}'
Although I'm not 100% sure if the status is in the 5h column, you better double check. While the job is idling, the status will be I or Q, while running R, and afterwards C.
Once it's finished, I usually grep the output files for signs that the run was a success or not, and then run the appropriate post-processing script.

One thing that works for me is to use qsub synchronous with the option
qsub -sync y job.sh
(either on command line or as
#$ -sync y
in the script (job.sh) itself.
qsub will then exit with code 0 only if the job (or all array jobs) have finished successfully.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to invoke an oozie workflow via shell script and block/wait till workflow completion - shell

Related

Snakemake does not recognise job failure due to timeout with error code -11

Why oozie submits shell action to yarn?

How to get the ID of MapReduce job submitted by the `hadoop jar <example.jar> <main-class>` command?

What are the different ways to check if the mapreduce program ran successfully

Running script on my local computer when jobs submitted by qsub on a server finish

Categories

Resources