Running script on my local computer when jobs submitted by qsub on a server finish - bash

I am submitting jobs via qsub to a server, and then want to analyze the results on the local machine after jobs are finished. Though I can find a way to submit the analysis job on the server, but don't know how to run that script on my local machine.
jobID=$(qsub job.sh)
qsub -W depend=afterok:$jobID analyze.sh
But instead of the above, I want something like
if(qsub -W depend=afterok:$jobID) finished successfully
sh analyze.sh
else
some script
How can I accomplish the above task?
Thank you very much.

I've faced a similar issue and I'll try to sketch the solution that worked for me:
After submitting your actual job,
jobID=$(qsub job.sh)
I would create a loop in your script that checks if the job is still running using
qstat $jobID | grep $jobID | awk '{print $5}'
Although I'm not 100% sure if the status is in the 5h column, you better double check. While the job is idling, the status will be I or Q, while running R, and afterwards C.
Once it's finished, I usually grep the output files for signs that the run was a success or not, and then run the appropriate post-processing script.

One thing that works for me is to use qsub synchronous with the option
qsub -sync y job.sh
(either on command line or as
#$ -sync y
in the script (job.sh) itself.
qsub will then exit with code 0 only if the job (or all array jobs) have finished successfully.

Related

jobs command result is empty when process is run through script

I need to run rsync in background through shell script but once it has started, I need to monitor the status of that jobs through shell.
jobs command return empty when its run in shell after the script exits. ps -ef | grep rsync shows that the rsync is still running.
I can check the status through script but I need to run the script multiple times so it uses a different ip.txt file to push. So I can't have the script running to check jobs status.
Here is the script:
for i in `cat $ip.txt`; do
rsync -avzh $directory/ user#"$i":/cygdrive/c/test/$directory 2>&1 > /dev/null &
done
jobs; #shows the jobs status while in the shell script.
exit 1
Output of jobs command is empty after the shell script exits:
root#host001:~# jobs
root#host001:~#
What could be the reason and how could I get the status of jobs while the rsync is running in background? I can't find an article online related to this.
Since your shell (the one from which you execute jobs) did not start rsync, it doesn't know anything about it. There are different approaches to fixing that, but it boils down to starting the background process from your shell. For example, you can start the script you have using the source BASH command instead of executing it in a separate process. Of course, you'd have to remove the exit 1 at the end, because that exits your shell otherwise.

LSF - BSUB Running a script if the job is killed

Im working with the LSF, running bsub commands.
I'm implementing the -Ep switch to run a post exec script. This works great until the Job is killed or hits a memory limit, run limit etc.
Is there any way for the job to detect its running out of resource and then run the script? or to force it to run the script even if its been killed?
I guess my other option is running job with a dependency on that job which will run the "post exec" script when it finishes.
Any thoughts?
Kind Regards,
TheBigPeeler
From the documentation, you should be seeing the behaviour that you want.
A post-execution command runs after the job finishes, regardless of
the exit state of the job. Once a post-execution command is associated
with a job, that command runs even if the job fails. You cannot
configure the post-execution command to run only under certain
conditions.
I thought that maybe the interaction with JOB_INCLUDE_POSTEXEC (lsb.params) could account for the difference, but from my test the post-exec still runs in both cases. I used runlimit (bsub -W) to trigger the job kill.
Is it possible that the post exec is running, but exits early?
What version of LSF are you using? (What's the output of mbatchd -V and sbatchd -V)

Force shell script to run tasks in sequence

I'm running a shell scripts that executes several tasks. The thing is that the script does not wait for a task to end before starting the next one. My script should work differently, waiting for one task to be completed before the next one to start. Is there a way to do that? My script looks like this
sbatch retr.sc 19860101 19860630
scp EN/EN1986* myhostname#myhost.it:/storage/myhostname/MetFiles
the first command runs retr.sc, that retrieves files and it takes half an hour roughly. The second command, though, is run right soon, moving just some files to destination. I wish the scp command to be run only when the first is complete.
thanks in advance
You have several options:
use srun rather than sbatch: srun retr.sc 19860101 19860630
use sbatch for the second command as well, and make it depend on the first one
like this:
RES=$(sbatch retr.sc 19860101 19860630)
sbatch --depend=after:${RES##* } --wrap "scp EN/EN1986* myhostname#myhost.it:/storage/myhostname/MetFiles"
create one script that incorporates both retr.sc and scp and submit that script.
sbatch exits immediately on submitting the job to slurm.
salloc will wait for the job to finish before exiting.
from the man page:
$ salloc -N16 xterm
salloc: Granted job allocation 65537
(at this point the xterm appears, and salloc waits for xterm to exit)
salloc: Relinquishing job allocation 65537
Thanks for you replies
I've sorted out this way
RES=$(sbatch retr.sc $date1 $date2)
array=(${RES// / })
JOBID=${array[3]}
year1={date1:0:4}
sbatch --dependency=afterok:${JOBID} scp.sh $year1
where scp.sh is the script for transferring the file to my local machine

how to get the job_id in the sun grid system using qsub

Consider a script, "run.sh", to be sent to a cluster job queue via qsub,
qsub ./run.sh
My question is how do I get the number of the process -- the one that appears as ${PID} on the files *.o${PID} and *.e${PID} -- within the script run.sh?
Does qsub export it? On which name?
Well, apparently qsub man page does not have, but this page says that the variable $JOB_ID is created with the PID I was asking for.

Making qsub block until job is done?

Currently, I have a driver program that runs several thousand instances of a "payload" program and does some post-processing of the output. The driver currently calls the payload program directly, using a shell() function, from multiple threads. The shell() function executes a command in the current working directory, blocks until the command is finished running, and returns the data that was sent to stdout by the command. This works well on a single multicore machine. I want to modify the driver to submit qsub jobs to a large compute cluster instead, for more parallelism.
Is there a way to make the qsub command output its results to stdout instead of a file and block until the job is finished? Basically, I want it to act as much like "normal" execution of a command as possible, so that I can parallelize to the cluster with as little modification of my driver program as possible.
Edit: I thought all the grid engines were pretty much standardized. If they're not and it matters, I'm using Torque.
You don't mention what queuing system you're using, but SGE supports the '-sync y' option to qsub which will cause it to block until the job completes or exits.
In TORQUE this is done using the -x and -I options. qsub -I specifies that it should be interactive and -x says run only the command specified. For example:
qsub -I -x myscript.sh
will not return until myscript.sh finishes execution.
In PBS you can use qsub -Wblock=true <command>

Resources