how to determine how much time autosys job takes to go to "success" state - bash

Can anyone help me, I am trying to find the current status of an autosys job like below:
autorep -j jobname -d -L0 | grep "RUNNING" and if it is not running, I have to force start the same job.
After the same job is successfully restarted, I have to wait until the job status is SUCCESS to continue with my rest of the commands in my shell script. Please guide, many thanks for your help. Below is my code
status=`autorep -j jobname -d -L0 | grep "RUNNING"| awk '{print$1}'
echo $status
if["$status"=="RUNNING"];then
echo "The job is in RUNNING state"
else
echo "Force starting the job now !"
fsj jobname
fi
and, after the job status is success, I can continue with rest of my scripting. But the question is, how to know if I have to put sleep for 30 mins, 40 mins etc (appx job runs for 30 mins)
Is there any way to automatically trigger the next command,after the job is successful...instead of using sleep.
Thanks again.

while sleep 900; do autorep -j jobname -d -L0 |
awk '$1~/RUNNING/{r=1} END{if(!r) system("fsj jobname")}'; done &
This background loop will check the first column of your autorep output for "RUNNING" every 15 minutes, executing fsj jobname if it doesn't find a match. You might also be able to track the PID of the relevant process if you can identify it.

Related

question on using bwait to wait for multiple bsub jobs to finish

I am new to using LSF (been using PBS/Torque all along).
I need to write code/logic to make sure all bsub jobs finish before other commands/jobs can be fired.
Here is what I have done: I have a master shell script which calls multiple other shell scripts via bsub commands. I capture the job ids from bsub in a log file and I need to ensure that all jobs get finished before the downstream shell script should execute its other commands.
Master shell script
#!/bin/bash
...Code not shown for brevity..
"Command 1 invoked with multiple bsubs" > log_cmd_1.txt
Need Code logic to use bwait before downstream Commands can be used
"Command 2 will be invoked with multiple bsubs" > log_cmd_2.txt
and so on
stdout captured from Command 1 within the Master Shell script is stored in log_cmd_1.txt which looks like this
Submitting Sample 101
Job <545> is submitted to .
Submitting Sample 102
Job <546> is submitted to .
Submitting Sample 103
Job <547> is submitted to .
Submitting Sample 104
Job <548> is submitted to .
I have used the codeblock shown below after Command 1 in the master shell script.
However, it does not seem to work for my situation. Looks like I would have gotten the whole thing wrong below.
while sleep 30m;
do
#the below gets the JobId from the log_cmd_1.txt and tries bwait
grep '^Job' <path_to>/log_cmd_1.txt | perl -pe 's!.*?<(\d+)>.*!$1!' | while read -r line; do res=$(bwait -w "done($line)");echo $res; done 1>
<path_to>/running.txt;
# the below sed command deletes lines that start with Space
sed '/^\s*$/d' running.txt > running2.txt;
# -s file check operator means "file is not zero size"
if [ -s $WORK_DIR/logs/running2.txt ]
then
echo "Jobs still running";
else
echo "Jobs complete";
break;
fi
done
The question: What's the correct way to do this using bwait within the master shell script.
Thanks in advance.
bwait will block until the condition is satisfied, so the loops are probably not neecessary. Note that since you're using done, if the job fails then bwait will exit and inform you that the condition can never be satisfied. Make sure to check that case.
What you have should work. At least the following test worked for me.
#!/bin/bash
# "Command 1 invoked with multiple bsubs" > log_cmd_1.txt
( bsub sleep 0; bsub sleep 0 ) > log_cmd_1.txt
# Need Code logic to use bwait before downstream Commands can be used
while sleep 1
do
#the below gets the JobId from the log_cmd_1.txt and tries bwait
grep '^Job' log_cmd_1.txt | perl -pe 's!.*?<(\d+)>.*!$1!' | while read -r line; do res=$(bwait -w "done($line)");echo "$res"; done 1> running.txt;
# the below sed command deletes lines that start with Space
sed '/^\s*$/d' running.txt > running2.txt;
# -s file check operator means "file is not zero size"
if [ -s running2.txt ]
then
echo "Jobs still running";
else
echo "Jobs complete";
break;
fi
done
Another way to do it. Which may is a little cleaner, is to use job arrays and job dependencies. Job arrays will combine several pieces of work that can be managed as a single job. So your
"Command 1 invoked with multiple bsubs" > log_cmd_1.txt
could be submitted as a single job array. You'll need a driver script that can launch the individual jobs. Here's an example driver script.
$ cat runbatch1.sh
#!/bin/bash
# $LSB_JOBINDEX goes from 1 to 10
if [ "$LSB_JOBINDEX" -eq 1 ]; then
# do the work for job batch 1, job 1
...
elif [ "$LSB_JOBINDEX" -eq 2 ]; then
# etc
...
fi
Then you can submit the job array like this.
bsub -J 'batch1[1-10]' sh runbatch1.sh
This command will run 10 job array elements. The driver script's environment will use the variable LSB_JOB_INDEX to let you know which element the driver is running. Since the array has a name, batch, it's easier to manage. You can submit a second job array that won't start until all elements of the first have completed successfully. The second array is submitted with this command.
bsub -w 'done(batch1)' -J 'batch2[1-10]' sh runbatch2.sh
I hope that this helps.

Checking for status of qsub jobs running within shell script

I have been given a c shell script that launches 800 individual qsubs for a sample. I need to run this script on more than 500 samples (listed in samples.txt). To automate the process, I thought about running the script (named SrchDriver) using the following bash shell script:
#!/bin/sh
for item in $(cat samples.txt)
do
(cd dir_"$item"/MAPGAPS && SrchDriver "$item"_Out 3)
done
This script would launch the SrchDriver script for all samples one right after another which would result in too many jobs on the server at one time. I would like to run only one sample at a time by waiting for all qsubs to finish for a particular sample.
What is the best way to put in a check for running/waiting jobs for a sample and holding the launch of the Srchdriver script for additional samples until all jobs are finished for the current sample?
I was thinking to first wait for 30 seconds and then check status of the qsubs (name of jobs is mapgaps). Next, I wanted to use a while loop to check the status every 30 seconds. Once the status is no longer 0, then proceed to the next sample. Would this be correct?
sleep 30
qstat | grep mapgaps &> /dev/null
while [ $? -eq 0 ];
do
sleep 30
qstat | grep mapgaps &> /dev/null
done;
If correct, how would I combine it with my for-loop? Would the following code below be correct?
#!/bin/sh
for item in $(cat samples.txt)
do
(cd dir_"$item"/MAPGAPS && SrchDriver "$item"_Out 3)
sleep 30
qstat | grep mapgaps &> /dev/null
status=$?
while [ $status = 0 ]
do
sleep 30
qstat | grep mapgaps &> /dev/null
status=$?
done
done
Thanks in advance for help. Please let me know if more information is needed.
Your script should work as is, indeed. The logic is sound and the syntax is correct.
A small improvement: the while statement can take the return status of a command directly, without using $?, so you could write your script like this:
#!/bin/sh
for item in $(cat samples.txt)
do
(cd dir_"$item"/MAPGAPS && SrchDriver "$item"_Out 3)
sleep 30
while qstat | grep mapgaps &> /dev/null
do
sleep 30
done
done

Cron job won't start again after I stopped it?

I wrote a script to run constantly on startup. If for whatever reason the script were to fail, I wrote a second script to check if it has failed, and if so, run the first script again. I then set this second script as a cronjob to run every minute so that it is constantly checking if the first script is alive.
So to test this, I reboot my system. I can see in htop that the first script is running from start up as expected. Good. I kill the process to test the second script. Sure enough, the second script starts the first script again. Still good. I then kill this process, but the second script won't run again now. It still updates a txt file when I manually start the first script, but the second script just doesn't start the first script like it's supposed to. Is it because I killed the cronjob? Restarting the cron service doesn't fix anything though, so I don't know why my second script isn't running again at all.
First script:
#!/bin/bash
stamp=$(date +%Y%m%d-%H%M)
timeout 10d tcpdump -i eth0 -s 96 -z gzip -C 10 -w /home/user/Documents/${stamp}
Second script:
#!/bin/bash
echo "not running" > /home/working.txt
if (( $(ps -ef | grep -v grep | grep tcpdump.sh | wc -l) > 0 ))
then
echo "tcpdump is running!!!" > /home/working.txt
else
/usr/local/bin/tcpdump.sh start
fi
Any help?
You would probably be better off running a simple for loop as the main script, and that kicks off the tcpdump script in the background, so something like:
#!/bin/bash
while true; do
if ps -ef | grep -v grep | grep -q tcpdump; then
: tcpdump running OK
else
# tcpdump not running - start it off
nohup /usr/local/bin/firstscript.sh start &
fi
sleep 30
done
This checks that "tcpdump.sh" is in the output of the "ps -ef" command - if it is, then do nothing (note that you must have an actual command between the "then" and "else" - the ":" command, which just takes it s arguments and ignores them, is sufficient). If it isn't running, start the first script in the background. Then sleep 30 seconds and check again. (Yes, I could have inverted the test so that I didn't need an empty "then" arm, but it would have made the code less obvious)
You put this script as the one which starts at boot time.
Edit: Do you really want to check for "tcpdump.sh"? Is that what the first script is actually called? Assuming that you actually want to check for the tcpdump program, you could use:
if pgrep tcpdump; then

Hold remainder of shell script commands until PBS qsub array job completes

I am very new to shell scripting, and I am trying to write a shell pipeline that submits multiple qsub jobs, but has several commands to run in between these qsubs, which are contingent on the most recent job completing. I have been researching multiple ways to try and hold the shell script from proceeding after submission of a qsub job, but none have been successful.
The simplest chunk of code I can provide to illustrate the issue is as follows:
THREADS=`wc -l < list1.txt`
qsub -V -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
There are obviously other lines of code after this that are actually contingent on firstjob.sh finishing, but I have omitted them here for clarity. I have tried the following methods of pausing/holding the script:
1) Only using wait, which is supposed to stop the script until all background programs are completed. This pushed right past the wait and printed the echo statement to the terminal while the array job was still running. My guess is this is occurring because once the qsub job is submitted, is exits and wait thinks it has completed?
qsub -V -t 1-$THREADS firstjob.sh
wait
echo "firstjob.sh completed"
2) Setting the job to a variable, echoing that variable to submit the job, and using the the entire job ID along with wait to pause. The echo command should wait until all elements of the array job have completed.The error message is shown following the code, within the code block.
job1=$(qsub -V -t 1-$THREADS firstjob.sh)
echo "$job1"
wait $job1
echo "firstjob.sh completed"
####ERROR RECEIVED####
-bash: wait: `4585057[].cluster-name.local': not a pid or valid job spec
3) Using the -sync y for qsub. This should prevent it from exiting the qsub until the job is complete, acting as an effective pause...I had hoped. Error in comment after the commands. For some reason it is not reading the -sync option correctly?
qsub -V -sync y -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
####ERROR RECEIVED####
qsub: script file 'y' cannot be loaded - No such file or directory
4) Using a dummy shell script (the dummy just makes an empty file) so that I could use the -W depend=afterok: option of qsub to pause the script. This again pushes right past to the echo statement without any pause for submitting the dummy script. Both jobs get submitted, one right after the other, no pause.
job1=$(qsub -V -t 1-$THREADS demux.sh)
echo "$job1"
check=$(qsub -V -W depend=afterok:$job1 dummy.sh)
echo "$check"
echo "firstjob.sh completed"
Some further details regarding the script:
Each job submission is an array job.
The pipeline is being run in the terminal using a command resembling the following, so that I may provide it with 3 inputs: source Pipeline.sh -r list1.txt -d /workingDir/ -s list2.txt
I am certain that the firstjob.sh has not actually completed running because I see them in the queue when I use showq.
Perhaps there is an easy fix in most of these scenarios, but being new to all this, I am really struggling. I have to use this method in 8-10 places throughout the script, so it is really hindering progress. Would appreciate any assistance. Thanks.
POST EDIT 1
Here is the code contained in firstjob.sh...though doubtful that it will help. Everything in here functions as expected, always produces the correct results.
\#! /bin/bash
\#PBS -S /bin/bash
\#PBS -N demux
\#PBS -l walltime=72:00:00
\#PBS -j oe
\#PBS -l nodes=1:ppn=4
\#PBS -l mem=15gb
module load biotools
cd ${WORKDIR}/rawFQs/
INFILE=`head -$PBS_ARRAYID ${WORKDIR}${RAWFQ} | tail -1`
BASE=`basename "$INFILE" .fq.gz`
zcat $INFILE | fastx_barcode_splitter.pl --bcfile ${WORKDIR}/rawFQs/DemuxLists/${BASE}_sheet4splitter.txt --prefix ${WORKDIR}/fastqs/ --bol --suffix ".fq"
I just tried using -sync y, and that worked for me, so good idea there... Not sure what's different about your setup.
But a couple other things you could try involve your main script knowing the status of the qsub jobs you're running. One idea is that you could have your main script check the status of your job using qstat and wait until it finishes before proceeding.
Alternatively, you could have the first job write to a file as its last step (or, as you suggested, set up a dummy job that waits for the first job to finish). Then in your main script, you can test to see whether that file has been written before going on.

How to know the PBS batch job submit time inside the script being excuted?

I'm using the PBS qsub to run a script on a cluster that must output a report file named with the batch job submit time.
The batch job submit time is the time it joins the PBS batch job que.
I checked all PBS default variables but I didn't find anything related to the job submit time.
I would like to know how can I get this time without creating a new input variable.
Thanks.
I figured out this by myself.
Add the following function into your PBS batch job script to get the job submit time.
getsubmitdate(){
local datestring=`qstat -f $PBS_JOBID | grep -F qtime | awk '{for(i=3;i<8;i++) printf $i" "}'`;
local result=`date -d "$datestring" +%Y%m%d` ;
local outputvar=$1 ;
if [[ "$outputvar" ]] ; then
eval $outputvar="'$result'"
else
echo "$result"
fi
}
getsubmitdate SUBMITDATE
echo $SUBMITDATE

Resources