how to get the job_id in the sun grid system using qsub - cluster-computing

Consider a script, "run.sh", to be sent to a cluster job queue via qsub,
qsub ./run.sh
My question is how do I get the number of the process -- the one that appears as ${PID} on the files *.o${PID} and *.e${PID} -- within the script run.sh?
Does qsub export it? On which name?

Well, apparently qsub man page does not have, but this page says that the variable $JOB_ID is created with the PID I was asking for.

Related

How to cancel a curl request in a shell script [duplicate]

How do I kill the last spawned background task in linux?
Example:
doSomething
doAnotherThing
doB &
doC
doD
#kill doB
????
You can kill by job number. When you put a task in the background you'll see something like:
$ ./script &
[1] 35341
That [1] is the job number and can be referenced like:
$ kill %1
$ kill %% # Most recent background job
To see a list of job numbers use the jobs command. More from man bash:
There are a number of ways to refer to a job in the shell. The character % introduces a job name. Job number n may be
referred to as %n. A job may also be referred to using a prefix of the name used to start it, or using a substring that
appears in its command line. For example, %ce refers to a stopped ce job. If a prefix matches more than one job, bash
reports an error. Using %?ce, on the other hand, refers to any job containing the string ce in its command line. If the
substring matches more than one job, bash reports an error. The symbols %% and %+ refer to the shell's notion of the current job, which is the last job stopped while it was in the foreground or started in the background. The previous job may
be referenced using %-. In output pertaining to jobs (e.g., the output of the jobs command), the current job is always
flagged with a +, and the previous job with a -. A single % (with no accompanying job specification) also refers to the
current job.
There's a special variable for this in bash:
kill $!
$! expands to the PID of the last process executed in the background.
The following command gives you a list of all background processes in your session, along with the pid. You can then use it to kill the process.
jobs -l
Example usage:
$ sleep 300 &
$ jobs -l
[1]+ 31139 Running sleep 300 &
$ kill 31139
This should kill all background processes:
jobs -p | xargs kill -9
skill doB
skill is a version of the kill command that lets you select one or multiple processes based on a given criteria.
You need its pid... use "ps -A" to find it.
this is an out of topic answer, but, for those who are interested, it maybe valuable.
As in #John Kugelman's answer, % is related to job specification.
how to efficiently find that? use less's &pattern command, seems man use less pager (not that sure), in man bash type &% then type Enter will only show lines that containing '%', to reshow all, type &. then Enter.
Just use the killall command:
killall taskname
for more info and more advanced options, type "man killall".

What does percent sign % do in "kill %vmtouch"?

I came across this shell script
bash# while true; do
vmtouch -m 10000000000 -l *head* & sleep 10m
kill %vmtouch
done
and wonder how does the kill %vmtouch portion work?
I normally pass a pid to kill a process but how does %vmtouch resolve to a pid?
I tried to run portions of script seperately but I got
-bash: kill: %vmtouch: no such job error.
%something is not a general shell script feature, but syntax used by the kill, fg and bg builtin commands to identify jobs. It searches the list of the shell's active jobs for the given string, and then signals that.
Here's man bash searching for /jobspec:
The character % introduces a job specification (jobspec).
Job number n may be referred to as %n. A job may also be referred to using a prefix of the name used to start it, or using a substring that appears in its command line. [...]
So if you do:
sleep 30 &
cat &
You can use things like %sleep or %sl to conveniently refer to the last one without having to find or remember its pid or job number.
You should look at the Job control section of the man bash page. The character % introduces a job specification (jobspec). Ideally when you have started this background job, you should have seen an entry in the terminal
[1] 25647
where 25647 is some random number I used. The line above means that the process id of the last backgrounded job (on a pipeline, the process id of the last process) is using job number as 1.
The way you are using the job spec is wrong in your case as it does not take process name of the background job. The last backgrounded is referred to as %1, so ideally your kill command should have been written as below, which is the same as writing kill 25647
vmtouch -m 10000000000 -l *head* & sleep 10m
kill %1
But that said, instead of relying the jobspec ids, you can access the process id of the background job which is stored in a special shell variable $! which you can use as
vmtouch -m whatever -l *head* & vmtouch_pid=$!
sleep 10m
kill "$vmtouch_pid"
See Job Control Basics from the GNU bash man page.

How can I tell if a PBS script was called by bash or qsub

I have a PBS script that processes several environment variables. PBS is a wrapper for bash that sends the bash script to a job scheduling queue. The processed variables form a command to run a scientific application. A PBS script is written in bash with additional information for the job scheduler encoded in the bash comments.
How can I determine programmatically if my script was called by qsub, the command that interprets PBS scripts, or if it as called by bash?
If the script is running under bash I would like to treat the call as a dry run and only print out the command that was generated. In that way it bypasses the job queue entirely.
This may not be completely robust, but one heuristic which may work is to test for the existence of any of the following environmental variables which tend to be defined under qsub, as listed here.
PBS_O_HOST (the name of the host upon which the qsub command is running)
PBS_SERVER (the hostname of the pbs_server which qsub submits the job to)
PBS_O_QUEUE (the name of the original queue to which the job was submitted)
PBS_O_WORKDIR (the absolute path of the current working directory of the qsub command)
PBS_ARRAYID (each member of a job array is assigned a unique identifier)
PBS_ENVIRONMENT (set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interactive job)
PBS_JOBID (the job identifier assigned to the job by the batch system)
PBS_JOBNAME (the job name supplied by the user)
PBS_NODEFILE (the name of the file contain the list of nodes assigned to the job)
PBS_QUEUE (the name of the queue from which the job was executed from)
PBS_WALLTIME (the walltime requested by the user or default walltime allotted by the scheduler)
You can check the parent caller of bash:
CALLER=$(ps -p "$PPID" -o comm=)
if [[ <compare $CALLER with expected process name> ]]; then
<script was called by qsub or something>
fi
Extra note: Bash always has an unexported variable set: $BASH_VERSION so if it's set you'd be sure that the script is running with Bash. The question left would just be about which one called it.
Also, don't run the check inside a subshell () as you probably would get from $PPID the process of same shell, not the caller.
If your script is called with deeper levels in which case $PPID would not be enough, you can always recursively scan the parent pids with ps -p <pid> -o ppid=.

Running script on my local computer when jobs submitted by qsub on a server finish

I am submitting jobs via qsub to a server, and then want to analyze the results on the local machine after jobs are finished. Though I can find a way to submit the analysis job on the server, but don't know how to run that script on my local machine.
jobID=$(qsub job.sh)
qsub -W depend=afterok:$jobID analyze.sh
But instead of the above, I want something like
if(qsub -W depend=afterok:$jobID) finished successfully
sh analyze.sh
else
some script
How can I accomplish the above task?
Thank you very much.
I've faced a similar issue and I'll try to sketch the solution that worked for me:
After submitting your actual job,
jobID=$(qsub job.sh)
I would create a loop in your script that checks if the job is still running using
qstat $jobID | grep $jobID | awk '{print $5}'
Although I'm not 100% sure if the status is in the 5h column, you better double check. While the job is idling, the status will be I or Q, while running R, and afterwards C.
Once it's finished, I usually grep the output files for signs that the run was a success or not, and then run the appropriate post-processing script.
One thing that works for me is to use qsub synchronous with the option
qsub -sync y job.sh
(either on command line or as
#$ -sync y
in the script (job.sh) itself.
qsub will then exit with code 0 only if the job (or all array jobs) have finished successfully.

"qsub -now" equivalent using bsub

In SGE , we have
qsub -now yes/no <command>
By "-now yes" the job is scheduled immediately(if possible) or not at all . We are not put in pending queue .
By "-now no " the job is put in pending queue if it cannot be executed immediately .
But in LSF , we have qsub's equivalent as bsub .
in bsub, we are put in pending queue, if it cannot be executed immediately. We don't have option as "-now yes" as in qsub .
Do we something in bsub as "qsub -now"
P.S : One solution is that we can check for some time(some secondss) after running bsub, if we are scheduled or not and then exit . I am searching for a more elegant way .
I found the answer in an LSF way.
LSF does provide a way to quit a job if we its unable to schedule the resource. We hava a environment variable LSF_NIOS_PEND_TIMEOUT(specified in minutes) which quits the job, if its still in pending queue.
env LSF_NIOS_PEND_TIMEOUT=1 bsub -Is -m host /bin/bash
From Somewhere on the web:
LSF_NIOS_PEND_TIMEOUT
Syntax
LSF_NIOS_PEND_TIMEOUT=minutes
Description
Applies only to interactive batch jobs.
Maximum amount of time that an interactive batch job can remain pending.
If this parameter is defined, and an interactive batch job is pending for longer than the specified time, the interactive batch job is terminated.
Valid values
Any integer greater than zero
LSF doesn't have the same thing. You could use expect w/ a timeout. LSF will output something like this when the job starts. Your expect script could expect <<Starting on. (But this is basically what your P.S. says.)
$ bsub -Is -m hostA /bin/bash
Job <7536> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on hostA>>
hostA$
You could maybe use lsrun. But it won't work with the batch system to allocate a slot or other resource.

Resources