(shell script - qsub) wait for submitted job to complete before next command - shell

My shell script involves qsub job submission and then copying the file generated by that job to some other location. How does one do that?
Here is how my shell script looks like:
...
qsub synplify.csh
cp ./rev_1/netlist.vqm ~/sample
...
Here, synplify.csh job is submitted on server but not completed. And it clears way to execute second line. Thus second line is executed while first job is being processed. I want second line to be executed after the job is completed.

Use -sync y the option.
qsub -sync y synplify.csh
cp ./rev_1/netlist.vqm ~/sample
From the man page:
-sync y causes qsub to wait for the job to complete
before exiting.

You can chain commands as described here:
https://unix.stackexchange.com/questions/63648/how-to-send-many-commands-to-shell-and-wait-for-the-command-behind-ends
Alternatively, you can submit separate scripts that use job dependencies (with afterok):
http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/help.htm#topics/moabWorkloadManager/topics/jobAdministration/jobdependencies.html

Related

Script didn't Finish execution but cron job started again

i am trying to run a cron job which will execute my shell script, my shell script is having hive & pig scripts. I am setting the cron job to execute after every 2 mins but before my shell script is getting finish my cron job starts again is it going to effect my result or once the script finishes its execution then only it will start. I am in a bit of dilemma here. Please help.
Thanks
I think there are two ways to better resolve this, a long way and a short way:
Long way (probably most correct):
Use something like Luigi to manage job dependencies, then run that with Cron (it won't run more than one of the same job).
Luigi will handle all your job dependencies for you and you can make sure that a particular job only executes once. It's a little more work to get set-up, but it's really worth it.
Short Way:
Lock files have already been mentioned, but you can do this on HDFS too, that way it doesn't depend on where you run the cron job from.
Instead of checking for a lock file, put a flag on HDFS when you start and finish the job, and have this as a standard thing in all of your cron jobs:
# at start
hadoop fs -touchz /jobs/job1/2016-07-01/_STARTED
# at finish
hadoop fs -touchz /jobs/job1/2016-07-01/_COMPLETED
# Then check them (pseudocode):
if(!started && !completed): run_job; add_completed; remove_started
At the start of the script, have a check:
#!/bin/bash
if [ -e /tmp/file.lock ]; then
rm /tmp/file.lock # removes the lock and continue
else
exit # No lock file exists, which means prev execution has not completed.
fi
.... # Your script here
touch /tmp/file.lock
There are many others ways of achieving the same. I am giving a simple example.

Force shell script to run tasks in sequence

I'm running a shell scripts that executes several tasks. The thing is that the script does not wait for a task to end before starting the next one. My script should work differently, waiting for one task to be completed before the next one to start. Is there a way to do that? My script looks like this
sbatch retr.sc 19860101 19860630
scp EN/EN1986* myhostname#myhost.it:/storage/myhostname/MetFiles
the first command runs retr.sc, that retrieves files and it takes half an hour roughly. The second command, though, is run right soon, moving just some files to destination. I wish the scp command to be run only when the first is complete.
thanks in advance
You have several options:
use srun rather than sbatch: srun retr.sc 19860101 19860630
use sbatch for the second command as well, and make it depend on the first one
like this:
RES=$(sbatch retr.sc 19860101 19860630)
sbatch --depend=after:${RES##* } --wrap "scp EN/EN1986* myhostname#myhost.it:/storage/myhostname/MetFiles"
create one script that incorporates both retr.sc and scp and submit that script.
sbatch exits immediately on submitting the job to slurm.
salloc will wait for the job to finish before exiting.
from the man page:
$ salloc -N16 xterm
salloc: Granted job allocation 65537
(at this point the xterm appears, and salloc waits for xterm to exit)
salloc: Relinquishing job allocation 65537
Thanks for you replies
I've sorted out this way
RES=$(sbatch retr.sc $date1 $date2)
array=(${RES// / })
JOBID=${array[3]}
year1={date1:0:4}
sbatch --dependency=afterok:${JOBID} scp.sh $year1
where scp.sh is the script for transferring the file to my local machine

Running script on my local computer when jobs submitted by qsub on a server finish

I am submitting jobs via qsub to a server, and then want to analyze the results on the local machine after jobs are finished. Though I can find a way to submit the analysis job on the server, but don't know how to run that script on my local machine.
jobID=$(qsub job.sh)
qsub -W depend=afterok:$jobID analyze.sh
But instead of the above, I want something like
if(qsub -W depend=afterok:$jobID) finished successfully
sh analyze.sh
else
some script
How can I accomplish the above task?
Thank you very much.
I've faced a similar issue and I'll try to sketch the solution that worked for me:
After submitting your actual job,
jobID=$(qsub job.sh)
I would create a loop in your script that checks if the job is still running using
qstat $jobID | grep $jobID | awk '{print $5}'
Although I'm not 100% sure if the status is in the 5h column, you better double check. While the job is idling, the status will be I or Q, while running R, and afterwards C.
Once it's finished, I usually grep the output files for signs that the run was a success or not, and then run the appropriate post-processing script.
One thing that works for me is to use qsub synchronous with the option
qsub -sync y job.sh
(either on command line or as
#$ -sync y
in the script (job.sh) itself.
qsub will then exit with code 0 only if the job (or all array jobs) have finished successfully.

"qsub -now" equivalent using bsub

In SGE , we have
qsub -now yes/no <command>
By "-now yes" the job is scheduled immediately(if possible) or not at all . We are not put in pending queue .
By "-now no " the job is put in pending queue if it cannot be executed immediately .
But in LSF , we have qsub's equivalent as bsub .
in bsub, we are put in pending queue, if it cannot be executed immediately. We don't have option as "-now yes" as in qsub .
Do we something in bsub as "qsub -now"
P.S : One solution is that we can check for some time(some secondss) after running bsub, if we are scheduled or not and then exit . I am searching for a more elegant way .
I found the answer in an LSF way.
LSF does provide a way to quit a job if we its unable to schedule the resource. We hava a environment variable LSF_NIOS_PEND_TIMEOUT(specified in minutes) which quits the job, if its still in pending queue.
env LSF_NIOS_PEND_TIMEOUT=1 bsub -Is -m host /bin/bash
From Somewhere on the web:
LSF_NIOS_PEND_TIMEOUT
Syntax
LSF_NIOS_PEND_TIMEOUT=minutes
Description
Applies only to interactive batch jobs.
Maximum amount of time that an interactive batch job can remain pending.
If this parameter is defined, and an interactive batch job is pending for longer than the specified time, the interactive batch job is terminated.
Valid values
Any integer greater than zero
LSF doesn't have the same thing. You could use expect w/ a timeout. LSF will output something like this when the job starts. Your expect script could expect <<Starting on. (But this is basically what your P.S. says.)
$ bsub -Is -m hostA /bin/bash
Job <7536> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on hostA>>
hostA$
You could maybe use lsrun. But it won't work with the batch system to allocate a slot or other resource.

Making qsub block until job is done?

Currently, I have a driver program that runs several thousand instances of a "payload" program and does some post-processing of the output. The driver currently calls the payload program directly, using a shell() function, from multiple threads. The shell() function executes a command in the current working directory, blocks until the command is finished running, and returns the data that was sent to stdout by the command. This works well on a single multicore machine. I want to modify the driver to submit qsub jobs to a large compute cluster instead, for more parallelism.
Is there a way to make the qsub command output its results to stdout instead of a file and block until the job is finished? Basically, I want it to act as much like "normal" execution of a command as possible, so that I can parallelize to the cluster with as little modification of my driver program as possible.
Edit: I thought all the grid engines were pretty much standardized. If they're not and it matters, I'm using Torque.
You don't mention what queuing system you're using, but SGE supports the '-sync y' option to qsub which will cause it to block until the job completes or exits.
In TORQUE this is done using the -x and -I options. qsub -I specifies that it should be interactive and -x says run only the command specified. For example:
qsub -I -x myscript.sh
will not return until myscript.sh finishes execution.
In PBS you can use qsub -Wblock=true <command>

Resources