Autosys job not failing when the shell script fails - shell

I am moving existing manual shell scripts to execute via autosys jobs. However, after adding exit 1 for each failed autosys job; it is not failing and autosys shows exit code as 0.
I tried the below simple script
#!/bin/ksh
exit 1;
When I execute this, the autosys job shows a success status.I have not updated success code or max success code in autosys, everything is default. What am I missing?

Related

bash: stop subshell script marked as failed if one step exits with an error

I am running a script through the SLURM job scheduler on HPC.
I am invoking a subshell script through a master script.
The subshell script contains several steps. One step in the script sometimes fails because of the quality of the data; this step is not required for further steps, but if this step fails, my whole subshell script is marked with "failed" Status in the job scheduler. However, I need this subshell script to have a "completed" Status in the Job scheduler as it is dependency in my master script.
I tried setting up
set +e
in my subshell script right before the optional step, but it doesn't seem to work: I still get an exitCode with errors and FAILED status inthe job scheduler.
In short: I need the subshell script to have Status "completed" in the job scheduler, no matter whether one particular step is finished with errors or not. Will appreciate help with this.
For Slurm jobs submitted with sbatch, the job exit code is taken to be the return code of the submission script itself. The return code of a Bash script is that of the last command in the script.
So if you just end your script with exit 0, Slurm should consider it COMPLETED no matter what.

SLURM status string on job completion / exit

How do I get the slurm job status (e.g. COMPLETED, FAILED, TIMEOUT, ...) on job completion (within the submission script)?
I.e. I want to write to separately keep track of jobs which are timed out / failed.
Currently I work with the exit code, however jobs which TIMEOUT also get exit code 0.
For future reference, here is how I finally do it.
To retrieve the jobid at the beginning of the job and write some information (e.g. "${SLURM_JOB_ID} ${PWD}") to a summary file.
Then process this file and use something like sacct -X -n -o State --j ${jid} to get the job status.

Autosys Job failing due to underlying script not being exited

I have an Autosys Job, which calls a wrapper script, which in turn calls a script which starts up a server and keeps running.
Now the problem i am facing is that once the job starts , it keeps on waiting for the exit signal or script to exit, but being unable to find so sends the job into a FAILED state.
Can anyone provide a workaround for this?
For demo :
lets say i have autosys job :A
wrapper script as : W.sh
Main restart script : serverrestart.sh
A
|---W.sh
|---serverrestart.sh ( Always running )

What are the different ways to check if the mapreduce program ran successfully

If we need to automate a mapreduce program or run from a script, what are the different ways to check if the mapreduce program ran successfully? One way is to find is if _SUCCESS file is created in the output directory. Does the command "hadoop jar program.jar hdfs:/input.txt hdfs:/output" return 0 or 1 based on success or failure ?
Just like any other command in Linux, you can check the exit status of a
hadoop jar command using the built in variable $?.
You can use:
echo $?
after executing the hadoop jar command to check its status.
The exit status value varies from 0 to 255. An exit status of zero implies that the command executed successfully while a non-zero value indicates that the command failed.
Edit: To see how to achieve automation or to run from a script, refer Hadoop job fails when invoked by cron.

Terminate shell script if stored proc fails, to capture status in my autosys job

I have Korn script that executes my SQL stored proc. My Autosys job invokes this script. I want to capture the status of the stored proc in my autosys job. Right now even if my stored proc fails, autosys captures success since the shell script succeeds(i can capture teh error in script but the script it self doesn't fail). How do i fail the script if my stored proc fails and therefore fail the autosys job.
Exit with a non-zero return code, e.g.
exit 1

Resources