Use of a HEREDOC with SLURM sbatch --wrap - bash

I am encountering difficulties using a (Bash) HEREDOC with a SLURM sbatch submission, via --wrap.
I would like the following to work:
SBATCH_PARAMS=('--nodes=1' '--time=24:00:00' '--mem=64000' '--mail-type=ALL')
sbatch ${SBATCH_PARAMS[#]} --job-name="MWE" -o "MWE.log" --wrap <<EOF
SLURM_CPUS_ON_NODE=\${SLURM_CPUS_ON_NODE:-8}
SLURM_CPUS_PER_TASK=\${SLURM_CPUS_PER_TASK:-\$SLURM_CPUS_ON_NODE}
export OMP_NUM_THREADS=\$SLURM_CPUS_PER_TASK
parallel --joblog "MWE-jobs.log" --resume --resume-failed -k --linebuffer -j \$((\$OMP_NUM_THREADS/4)) --link "MWE.sh {1} {2}" ::: "./"*R1*.fastq.gz ::: "./"*R2*.fastq.gz
EOF
On my current cluster, sbatch returns the below error, refusing to submit this job:
ERROR: option --wrap requires argument
Might anyone know how I can get this to work?

Since wrap expects a string argument, you can't use a heredoc directly. Heredocs are used when a filename is expected where it's undesirable to make one.
Use a heredoc for cat, where it does expect a filename, and use its output as the string for which --wrap expects:
SBATCH_PARAMS=('--nodes=1' '--time=24:00:00' '--mem=64000' '--mail-type=ALL')
sbatch ${SBATCH_PARAMS[#]} --job-name="MWE" -o "MWE.log" --wrap $(cat << EOF
SLURM_CPUS_ON_NODE=\${SLURM_CPUS_ON_NODE:-8}
SLURM_CPUS_PER_TASK=\${SLURM_CPUS_PER_TASK:-\$SLURM_CPUS_ON_NODE}
export OMP_NUM_THREADS=\$SLURM_CPUS_PER_TASK
parallel --joblog "MWE-jobs.log" --resume --resume-failed -k --linebuffer -j \$((\$OMP_NUM_THREADS/4)) --link "MWE.sh {1} {2}" ::: "./"*R1*.fastq.gz ::: "./"*R2*.fastq.gz
EOF)

You can just use the heredoc without the wrap provided you add the #!/bin/bash at the top of it.

Adapting a related post on assigning a HEREDOC to a variable, but instead using cat (since I use errexit and want to avoid working-around the non-zero exit value of the read), I was able to submit my job as follows:
CMD_FOR_SUB=$(cat <<EOF
SLURM_CPUS_ON_NODE=\${SLURM_CPUS_ON_NODE:-8}
SLURM_CPUS_PER_TASK=\${SLURM_CPUS_PER_TASK:-\$SLURM_CPUS_ON_NODE}
export OMP_NUM_THREADS=\$SLURM_CPUS_PER_TASK
parallel --joblog "MWE-jobs.log" --resume --resume-failed -k --linebuffer -j \$((\$OMP_NUM_THREADS/4)) --link "MWE.sh {1} {2}" ::: "./"*R1*.fastq.gz ::: "./"*R2*.fastq.gz
EOF
)
sbatch ${SBATCH_PARAMS[#]} --job-name="MWE" -o "MWE.log" --wrap "$CMD_FOR_SUB"
While this does appear to work, I would still prefer a solution that allows sbatch to directly accept the HEREDOC.

Related

output redirection inside bsub command

Is it possible to use output redirection inside bsub command such as:
bsub -q short "cat <(head -2 myfile.txt) > outputfile.txt"
Currently this bsub execution fails. Also my attempts to escape the redirection sign and the parenthesis were all failed, such as:
bsub -q short "cat \<\(head -2 myfile.txt\) > outputfile.txt"
bsub -q short "cat <\(head -2 myfile.txt\) > outputfile.txt"
*Note, I'm well aware that the redirection in this simple command is not necessary as the command could easily be written as:
bsub -q short "head -2 myfile.txt > outputfile.txt"
and then it would indeed be executed properly (without errors). I am however interested in implementing the redirection of output '<' within the context of a more composed command, and am bringing this simple command here as an example only.
<(...) is process substitution -- a bash extension not available on baseline POSIX shells. system(), subprocess.Popen(..., shell=True) and similar calls use /bin/sh, which is not guaranteed to have such extensions.
As a mechanism that works with any possible command without needing to worry about how to correctly escape it into a string, you can export that function and any variables it uses through the environment:
# for the sake of example, moving filenames out-of-band
in_file=myfile.txt
out_file=outputfile.txt
mycmd() { cat <(head -2 <"$in_file") >"$out_file"; }
export -f mycmd # export the function into the environment
export in_file out_file # and also any variables it uses
bsub -q short 'bash -c mycmd' # ...before telling bsub to invoke bash to run the function
<(...) is a bash feature while your command runs with sh.
Invoke bash explicitly to handle your bash-only features:
bsub -q short "bash -c 'cat <(head -2 myfile.txt) > outputfile.txt'"

Bash: Execute command WITH ARGUMENTS in new terminal [duplicate]

This question already has answers here:
how do i start commands in new terminals in BASH script
(2 answers)
Closed 20 days ago.
So i want to open a new terminal in bash and execute a command with arguments.
As long as I only take something like ls as command it works fine, but when I take something like route -n , so a command with arguments, it doesnt work.
The code:
gnome-terminal --window-with-profile=Bash -e whoami #WORKS
gnome-terminal --window-with-profile=Bash -e route -n #DOESNT WORK
I already tried putting "" around the command and all that but it still doesnt work
You can start a new terminal with a command using the following:
gnome-terminal --window-with-profile=Bash -- \
bash -c "<command>"
To continue the terminal with the normal bash profile, add exec bash:
gnome-terminal --window-with-profile=Bash -- \
bash -c "<command>; exec bash"
Here's how to create a Here document and pass it as the command:
cmd="$(printf '%s\n' 'wc -w <<-EOF
First line of Here document.
Second line.
The output of this command will be '15'.
EOF' 'exec bash')"
xterm -e bash -c "${cmd}"
To open a new terminal and run an initial command with a script, add the following in a script:
nohup xterm -e bash -c "$(printf '%s\nexec bash' "$*")" &>/dev/null &
When $* is quoted, it expands the arguments to a single word, with each separated by the first character of IFS. nohup and &>/dev/null & are used only to allow the terminal to run in the background.
Try this:
gnome-terminal --window-with-profile=Bash -e 'bash -c "route -n; read"'
The final read prevents the window from closing after execution of the previous commands. It will close when you press a key.
If you want to experience headaches, you can try with more quote nesting:
gnome-terminal --window-with-profile=Bash \
-e 'bash -c "route -n; read -p '"'Press a key...'"'"'
(In the following examples there is no final read. Let’s suppose we fixed that in the profile.)
If you want to print an empty line and enjoy multi-level escaping too:
gnome-terminal --window-with-profile=Bash \
-e 'bash -c "printf \\\\n; route -n"'
The same, with another quoting style:
gnome-terminal --window-with-profile=Bash \
-e 'bash -c '\''printf "\n"; route -n'\'
Variables are expanded in double quotes, not single quotes, so if you want them expanded you need to ensure that the outermost quotes are double:
command='printf "\n"; route -n'
gnome-terminal --window-with-profile=Bash \
-e "bash -c '$command'"
Quoting can become really complex. When you need something more advanced that a simple couple of commands, it is advisable to write an independent shell script with all the readable, parametrized code you need, save it somewhere, say /home/user/bin/mycommand, and then invoke it simply as
gnome-terminal --window-with-profile=Bash -e /home/user/bin/mycommand

How to execute arbitrary command under `bash -c`

What is a procedure to decorate an arbitrary bash command to execute it in a subshell? I cannot change the command, I have to decorate it on the outside.
the best I can think of is
>bash -c '<command>'
works on these:
>bash -c 'echo'
>bash -c 'echo foobar'
>bash -c 'echo \"'
but what about the commands such as
echo \'
and especially
echo \'\"
The decoration has to be always the same for all commands. It has to always work.
You say "subshell" - you can get one of those by just putting parentheses around the command:
x=outer
(x=inner; echo "x=$x"; exit)
echo "x=$x"
produces this:
x=inner
x=outer
You could (ab)use heredocs:
bash -c "$(cat <<-EOF
echo \'\"
EOF
)"
This is one way without using -c option:
bash <<EOF
echo \'\"
EOF
What you want to do is exactly the same as escapeshellcmd() in PHP (http://php.net/manual/fr/function.escapeshellcmd.php)
You just need to escape #&;`|*?~<>^()[]{}$\, \x0A and \xFF. ' and " are escaped only if they are not paired.
But beware of security issues...
Let bash take care of it this way:
1) prepare the command as an array:
astrCmd=(echo \'\");
2) export the array as a simple string:
export EXPORTEDastrCmd="`declare -p astrCmd| sed -r "s,[^=]*='(.*)',\1,"`";
3) restore the array and run it as a full command:
bash -c "declare -a astrCmd='$EXPORTEDastrCmd';\${astrCmd[#]}"
Create a function to make these steps more easy like:
FUNCbash(){
astrCmd=("$#");
export EXPORTEDastrCmd="`declare -p astrCmd| sed -r "s,[^=]*='(.*)',\1,"`";
bash -c "declare -a astrCmd='$EXPORTEDastrCmd';\${astrCmd[#]}";
}
FUNCbash echo \'\"

Directly pass parameters to pbs script

Is there a way to directly pass parameters to a .pbs script before submitting a job? I need to loop over a list of files indicated by different numbers and apply a script to analyze each file.
The best I've been able to come up with is the following:
#!/bin/sh
for ((i= 1; i<= 10; i++))
do
export FILENUM=$i
qsub pass_test.pbs
done
where pass_test.pbs is the following script:
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $FILENUM
But this feels a bit wonky. Particularly, I want to avoid having to create an environment variable to handle this.
The qsub utility can read the script from the standard input, so by using a here document you can create scripts on the fly, dynamically:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -
#!/bin/sh
#PBS -V
#PBS -S /bin/sh
#PBS -N pass_test
#PBS -l nodes=1:ppn=1,walltime=00:02:00
#PBS -M XXXXXX#XXX.edu
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
Personally, I would use a more compact version:
#!/bin/sh
for i in `seq 1 10`
do
cat <<EOS | qsub -V -S /bin/sh -N pass_test -l nodes=1:ppn=1,walltime=00:02:00 -M XXXXXX#XXX.edu -
cd /scratch/XXXXXX/pass_test
./run_test $i
EOS
done
You can use the -F option, as described here:
-F
Specifies the arguments that will be passed to the job script when the script is launched. The accepted syntax is:
qsub -F "myarg1 myarg2 myarg3=myarg3value" myscript2.sh
Note: Quotation marks are required. qsub will fail with an error
message if the argument following -F is not a quoted value. The
pbs_mom server will pass the quoted value as arguments to the job
script when it launches the script.
See also this answer
If you just need to pass numbers and run a list of jobs with the same command except the input file number, it's better to use a job array instead of a for loop as job array would have less burden on the job scheduler.
To run, you specify the file number with PBS_ARRAYID like this in the pbs file:
./run_test ${PBS_ARRAYID}
And to invoke it, on command line, type:
qsub -t 1-10 pass_test.pbs
where you can specify what array id to use after -t option

How to properly pass on an environment variable to Sun Grid Engine?

I'm trying to submit a (series of) jobs to SGE (FWIW, it's a sequence of Gromacs molecular dynamics simulations), in which all the jobs are identical except for a suffix, such as input01, input02, etc. I wrote the commands to run in a way that the suffix is properly handled by the sequence of commands.
However, I can't find a way to get the exec environment to receive that variable. According to the qsub man page, -v var should do it.
$ export i=19
$ export | grep ' i='
declare -x i="19"
$ env | grep '^i='
i=19
Then, I submit the following script (run.sh) to see if it's received:
if [ "x" == "x$i" ]; then
echo "ERROR: \$i not set"
else
echo "SUCCESS: \$i is set"
fi
I submit the job as follows (in the same session as the export command above):
$ qsub -N "test_env" -cwd -v i run.sh
Your job 4606 ("test_env") has been submitted
The error stream is empty, and the output stream has:
$ cat test_env.o4606
ERROR: $i not set
I also tried the following commands, unsuccessfully:
$ qsub -N "test_env" -cwd -v i -V run.sh
$ qsub -N "test_env" -cwd -V run.sh
$ qsub -N "test_env" -cwd -v i=19 -V run.sh
$ qsub -N "test_env" -cwd -v i=19 run.sh
If I add a line i=19 to the beginning of run.sh, then the output is:
$ cat test_env.o4613
SUCCESS: $i is set as 19
I'm now considering generating a single file per job, which will essentially be the same but will have an i=xx line as the first. It doesn't look very much practical, but it would be a solution.
Would there be a better solution?
What I've been always doing is the following:
##send.sh
export a=10
qsub ./run.sh
and the script run.sh:
##run.sh
#$ -V
echo $a
when I call send.sh, the .o has an output of 10.
Assuming that your variable is just an incrementing counter: You can use Array Jobs to achieve this. This will set an $SGE_TASK_ID environment variable to the count which you can then copy to $i or use directly.
If the variable is anything else, then I think you'll have to generate multiple job scripts and submit each; that's the "solution" I use when I have loads of jobs with differing parameters.
I'm not certain you can pass variables by their name through qsub. I've had success with passing values (you should probably write a front-end script for this instead of doing it interactively):
$ export ii=19
$ qsub -N "test_env" -cwd -v i=$ii run.sh

Resources