How to properly pass on an environment variable to Sun Grid Engine? - bash

I'm trying to submit a (series of) jobs to SGE (FWIW, it's a sequence of Gromacs molecular dynamics simulations), in which all the jobs are identical except for a suffix, such as input01, input02, etc. I wrote the commands to run in a way that the suffix is properly handled by the sequence of commands.
However, I can't find a way to get the exec environment to receive that variable. According to the qsub man page, -v var should do it.
$ export i=19
$ export | grep ' i='
declare -x i="19"
$ env | grep '^i='
i=19
Then, I submit the following script (run.sh) to see if it's received:
if [ "x" == "x$i" ]; then
echo "ERROR: \$i not set"
else
echo "SUCCESS: \$i is set"
fi
I submit the job as follows (in the same session as the export command above):
$ qsub -N "test_env" -cwd -v i run.sh
Your job 4606 ("test_env") has been submitted
The error stream is empty, and the output stream has:
$ cat test_env.o4606
ERROR: $i not set
I also tried the following commands, unsuccessfully:
$ qsub -N "test_env" -cwd -v i -V run.sh
$ qsub -N "test_env" -cwd -V run.sh
$ qsub -N "test_env" -cwd -v i=19 -V run.sh
$ qsub -N "test_env" -cwd -v i=19 run.sh
If I add a line i=19 to the beginning of run.sh, then the output is:
$ cat test_env.o4613
SUCCESS: $i is set as 19
I'm now considering generating a single file per job, which will essentially be the same but will have an i=xx line as the first. It doesn't look very much practical, but it would be a solution.
Would there be a better solution?

What I've been always doing is the following:
##send.sh
export a=10
qsub ./run.sh
and the script run.sh:
##run.sh
#$ -V
echo $a
when I call send.sh, the .o has an output of 10.

Assuming that your variable is just an incrementing counter: You can use Array Jobs to achieve this. This will set an $SGE_TASK_ID environment variable to the count which you can then copy to $i or use directly.
If the variable is anything else, then I think you'll have to generate multiple job scripts and submit each; that's the "solution" I use when I have loads of jobs with differing parameters.

I'm not certain you can pass variables by their name through qsub. I've had success with passing values (you should probably write a front-end script for this instead of doing it interactively):
$ export ii=19
$ qsub -N "test_env" -cwd -v i=$ii run.sh

Related

Replacing 'source file' with its content, and expanding variables, in bash

In a script.sh,
source a.sh
source b.sh
CMD1
CMD2
CMD3
how can I replace the source *.sh with their content (without executing the commands)?
I would like to see what the bash interpreter executes after sourcing the files and expanding all variables.
I know I can use set -n -v or run bash -n -v script.sh 2>output.sh, but that would not replace the source commands (and even less if a.sh or b.sh contain variables).
I thought of using a subshell, but that still doesn't expand the source lines. I tried a combination of set +n +v and set -n -v before and after the source lines, but that still does not work.
I'm going to send that output to a remote machine using ssh.
I could use <<output.sh to pipe the content into the ssh command, but I can't log as root onto the remote machine, but I am however a sudoer.
Therefore, I thought I could create the script and send it as a base64-encoded string (using that clever trick )
base64 script | ssh remotehost 'base64 -d | sudo bash'
Is there a solution?
Or do you have a better idea?
You can do something like this:
inline.sh:
#!/usr/bin/env bash
while read line; do
if [[ "$line" =~ (\.|source)\s+.+ ]]; then
file="$(echo $line | cut -d' ' -f2)"
echo "$(cat $file)"
else
echo "$line"
fi
done < "$1"
Note this assumes the sourced files exist, and doesn't handle errors. You should also handle possible hashbangs. If the sourced files contain themselves source, you need to apply the script recursively, e.g. something like (not tested):
while egrep -q '^(source|\.)' main.sh; do
bash inline.sh main.sh > main.sh
done
Let's test it
main.sh:
source a.sh
. b.sh
echo cc
echo "$var_a $var_b"
a.sh:
echo aa
var_a="stack"
b.sh:
echo bb
var_b="overflow"
Result:
bash inline.sh main.sh
echo aa
var_a="stack"
echo bb
var_b="overflow"
echo cc
echo "$var_a $var_b"
bash inline.sh main.sh | bash
aa
bb
cc
stack overflow
BTW, if you just want to see what bash executes, you can run
bash -x [script]
or remotely
ssh user#host -t "bash -x [script]"

Using for loop with qsub for batch job submission

Could I please be advised how I could use a for loop to qsub files for batch job submission?
At the moment, this only works if I submit a single file for job submission using the command:
qsub -v /path/to/file.txt script.sh
However if I run a for loop through files using the following commands:
files=`pwd`/*pattern* (#This gives a list of files containing a certain common title)
for i in $files;
do
qsub -v $i script.sh
done
This always gets rejected with the error that the file.txt was not provided.
I have double checked if $i from the for loop is providing the right file.txt by doing:
for i in $files;
do
echo $i
done
and this works out fine. As such I am unsure why the for loop with qsub is not working. Could I please get advice on how I could alter the code to get it to work?
Thanks.
Using -v requires you to give the variable a name: qsub -v filepath=$i script.sh where you can then access the filepath inside script.sh with $filepath.

How to submit a cat command to a cluster using qsub and use the pipe correctly

I want to submit several "cat jobs" on the fly to a cluster using qsub. Currently I'm concatenating several files with cat to a single one at the end (using > output_file) at the end of the command.
The problem is that qsub takes the > output_file from the command as part of the qsub, putting there the log of the job instead of the cat output.
qsub -b y -cwd -q bigmem cmd1
where cmd1 looks like:
cat file1 file2 filen > output_file
Alternatively to dbeer's answer, if your code is disposable, you can use echo:
echo "cat file1 file2 ... filen > outfile" | qsub -cwd <options>
When a job is running via pbs, stdout is redirected to the output file of the job, so the way to do this is to write a script:
#!/bin/bash
cat file1 file2 ... filen
You don't need to redirect the output to a file because the mom daemon will do that for you in setting up the job, you just need to specify the output file you desire using -o. For example, if you named the above script script.sh (make sure it is executable) you'd submit:
qsub script.sh -b y -q bigmem -o output_file

Get the name of the caller script in bash script

Let's assume I have 3 shell scripts:
script_1.sh
#!/bin/bash
./script_3.sh
script_2.sh
#!/bin/bash
./script_3.sh
the problem is that in script_3.sh I want to know the name of the caller script.
so that I can respond differently to each caller I support
please don't assume I'm asking about $0 cause $0 will echo script_3 every time no matter who is the caller
here is an example input with expected output
./script_1.sh should echo script_1
./script_2.sh should echo script_2
./script_3.sh should echo user_name or root or anything to distinguish between the 3 cases?
Is that possible? and if possible, how can it be done?
this is going to be added to a rm modified script... so when I call rm it do something and when git or any other CLI tool use rm it is not affected by the modification
Based on #user3100381's answer, here's a much simpler command to get the same thing which I believe should be fairly portable:
PARENT_COMMAND=$(ps -o comm= $PPID)
Replace comm= with args= to get the full command line (command + arguments). The = alone is used to suppress the headers.
See: http://pubs.opengroup.org/onlinepubs/009604499/utilities/ps.html
In case you are sourceing instead of calling/executing the script there is no new process forked and thus the solutions with ps won't work reliably.
Use bash built-in caller in that case.
$ cat h.sh
#! /bin/bash
function warn_me() {
echo "$#"
caller
}
$
$ cat g.sh
#!/bin/bash
source h.sh
warn_me "Error: You did not do something"
$
$ . g.sh
Error: You did not do something
g.sh
$
Source
The $PPID variable holds the parent process ID. So you could parse the output from ps to get the command.
#!/bin/bash
PARENT_COMMAND=$(ps $PPID | tail -n 1 | awk "{print \$5}")
Based on #J.L.answer, with more in depth explanations, that works for linux :
cat /proc/$PPID/comm
gives you the name of the command of the parent pid
If you prefer the command with all options, then :
cat /proc/$PPID/cmdline
explanations :
$PPID is defined by the shell, it's the pid of the parent processes
in /proc/, you have some dirs with the pid of each process (linux). Then, if you cat /proc/$PPID/comm, you echo the command name of the PID
Check man proc
Couple of useful files things kept in /proc/$PPID here
/proc/*some_process_id*/exe A symlink to the last executed command under *some_process_id*
/proc/*some_process_id*/cmdline A file containing the last executed command under *some_process_id* and null-byte separated arguments
So a slight simplification.
sed 's/\x0/ /g' "/proc/$PPID/cmdline"
If you have /proc:
$(cat /proc/$PPID/comm)
Declare this:
PARENT_NAME=`ps -ocomm --no-header $PPID`
Thus you'll get a nice variable $PARENT_NAME that holds the parent's name.
You can simply use the command below to avoid calling cut/awk/sed:
ps --no-headers -o command $PPID
If you only want the parent and none of the subsequent processes, you can use:
ps --no-headers -o command $PPID | cut -d' ' -f1
You could pass in a variable to script_3.sh to determine how to respond...
script_1.sh
#!/bin/bash
./script_3.sh script1
script_2.sh
#!/bin/bash
./script_3.sh script2
script_3.sh
#!/bin/bash
if [ $1 == 'script1' ] ; then
echo "we were called from script1!"
elsif [ $1 == 'script2' ] ; then
echo "we were called from script2!"
fi

Directly pass parameters to pbs script

Is there a way to directly pass parameters to a .pbs script before submitting a job? I need to loop over a list of files indicated by different numbers and apply a script to analyze each file.
The best I've been able to come up with is the following:
#!/bin/sh
for ((i= 1; i<= 10; i++))
do
export FILENUM=$i
qsub pass_test.pbs
done
where pass_test.pbs is the following script:
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $FILENUM
But this feels a bit wonky. Particularly, I want to avoid having to create an environment variable to handle this.
The qsub utility can read the script from the standard input, so by using a here document you can create scripts on the fly, dynamically:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
Personally, I would use a more compact version:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -V -S /bin/sh -N pass_test -l nodes=1:ppn=1,walltime=00:02:00 -M XXXXXX#XXX.edu -
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
You can use the -F option, as described here:
-F
Specifies the arguments that will be passed to the job script when the script is launched. The accepted syntax is:
qsub -F "myarg1 myarg2 myarg3=myarg3value" myscript2.sh
Note: Quotation marks are required. qsub will fail with an error
message if the argument following -F is not a quoted value. The
pbs_mom server will pass the quoted value as arguments to the job
script when it launches the script.
See also this answer
If you just need to pass numbers and run a list of jobs with the same command except the input file number, it's better to use a job array instead of a for loop as job array would have less burden on the job scheduler.
To run, you specify the file number with PBS_ARRAYID like this in the pbs file:
./run_test ${PBS_ARRAYID}
And to invoke it, on command line, type:
qsub -t 1-10 pass_test.pbs
where you can specify what array id to use after -t option

Resources