Why the SLURM doesn't provide the nodes as is requested? - cluster-computing

My situation is the cluster consisted of 3 PCs (Raspbian with slurm 18), all connected together with shared file storage, mounted as /storage.
The task file is /storage/multiple_hello.sh:
#!/bin/bash
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=3
#SBATCH --ntasks=3
cd /storage
srun echo "Hello World from $(hostname)" >> ./"$SLURM_JOB_ID"_$(hostname).txt
It is ran as sbatch /storage/multiple_hello.sh and the expected outcome is creating in /storage 3 files named 120_node1.txt, 121_node2.txt and 122_node3.txt (arbitrary job numbers) since:
3 nodes were requested
3 tasks were requested
there was set a limitation for 1 node per task
Real output: created one file only: 120_node1.txt
How to make it work as intended?
Weird enoughh, the srun --nodes=3 hostname works as expected, and returns:
node1
node2
node3

To get the expected result, modify the last line as
srun bash -c 'echo "Hello World from $(hostname)" >> ./"$SLURM_JOB_ID"_$(hostname).txt'
The way Bash parses the line is different from what you are expecting. First, $hostname and $SLURM_JOBID are expanded on the first node of the allocation (the one that runs the submission script), then srun is run, and its output is appended to the file. You need to be specific that the redirection >> is part of what you want srun to do. With the above solution, the variable and command expansions are done on each node, as well as the redirection.

Related

How to run multiple unique parallel jobs on independent nodes using slurm with master / agent set up

I have a physical model optimization program that uses a master / agent design to run unique parameterizations of the model across multiple nodes in parallel. I reserve the nodes and create the working directories using a batch script that ultimately uses a srun -multi-prog pest.conf command to call the optimization software (PEST++). The optimization program then calls a bash script which ultimately calls the model executable. I've been using something like srun -n 20 process.exe, but keep getting "step creation temporarily disabled" error.
So the workflow is (1) call the batch script, which sets up directories and creates the muli-prog .conf script:
#SBATCH -N 4
#SBATCH --hint=nomultithread
#SBATCH -p workq
#SBATCH --time=1:00:00
(2) The resulting multi-prog pest.conf script looks like this:
0 bash -c 'cd /caldera/projects/usgs/water/waiee/wrftest/base_pp_dir_3593956 && pestpp-glm wrftest.v2.pst /h :10497'
1-3 bash -c 'cd ${WORKER_DIR}${SLURM_PROCID} && pestpp-glm wrftest.v2.pst /h nid00413:10497'
(3) wrftext.v2.pst calls a bash script which ultimately calls the model:
printf "Running WRF-H \n"
srun -n 20 ./wrf_hydro_NoahMP.exe
wait
printf "Finished WRF-H Run.\n\n"
simple calling srun -n 20 ./wrf_hydro.exe from the command line works as expected, so I'm wondering if slurm isn't recognizing the final srun command which is resulting in the step creation temporarily disabled error?

Slurm: Error when submitting to multiple nodes ("slurmstepd: error: execve(): python: No such file or directory")

I have have a bash script submit.sh for submitting training jobs to a Slurm server. It works as follows. Doing
bash submit.sh p1 8 config_file
will submit some task corresponding to config_file to 8 GPUs of partition p1. Each node of p1 has 4 GPUs, thus this command requests 2 nodes.
The content of submit.sh can be summarized as follows, in which I use sbatch to submit a Slurm script (train.slurm):
#!/bin/bash
# submit.sh
PARTITION=$1
NGPUs=$2
CONFIG=$3
NGPUS_PER_NODE=4
NCPUS_PER_TASK=10
sbatch --partition ${PARTITION} \
--job-name=${CONFIG} \
--output=logs/${CONFIG}_%j.log \
--ntasks=${NGPUs} \
--ntasks-per-node=${NGPUS_PER_NODE} \
--cpus-per-task=${NCPUS_PER_TASK} \
--gres=gpu:${NGPUS_PER_NODE} \
--hint=nomultithread \
--time=10:00:00
--export=CONFIG=${CONFIG},NGPUs=${NGPUs},NGPUS_PER_NODE=${NGPUS_PER_NODE} \
train.slurm
Now in the Slurm script, train.slurm, I decide whether to launch the training Python script on one or multiple nodes (the ways to launch it are different in these two cases):
#!/bin/bash
# train.slurm
#SBATCH --distribution=block:block
# Load Python environment
module purge
module load pytorch/py3/1.6.0
set -x
if [ ${NGPUs} -gt ${NGPUS_PER_NODE} ]; then # Multi-node training
# Some variables needed for the training script
export MASTER_PORT=12340
export WORLD_SIZE=${NGPUs}
# etc.
srun python train.py --cfg ${CONFIG}
else # Single-node training
python -u -m torch.distributed.launch --nproc_per_node=${NGPUS_PER_NODE} --use_env train.py --cfg ${CONFIG}
fi
Now if I submit on a single node (e.g., bash submit.sh p1 4 config_file), it works as expected. However, submitting on multiple nodes (e.g., bash submit.sh p1 8 config_file) produced the following error:
slurmstepd: error: execve(): python: No such file or directory
This means that the Python environment was not recognized on one of the nodes. I tried replacing python with $(which python) to take the full path to the Python binary in the virtual environment, but then I obtained another error:
OSError: libmpi_cxx.so.40: cannot open shared object file: No such file or directory
If I don't use submit.sh but instead, add all the #SBATCH variable to train.slurm, and submit the job using sbatch directly from the command line, then it works. Therefore, it seems that wrapping sbatch inside a bash script caused this issue.
Could you please help me to resolve this?
Thank you so much in advance.
Beware that the --export parameter will cause the environment for srun to be reset to exactly all the SLURM_* variables plus the ones explicitly set, so in your case CONFIG,NGPUs, NGPUS_PER_NODE. Consequently, the PATH variable will not be set and srun will not find the python executable.
Note that the --export does not alter the environment of the submission script, so the single-node case, that does not use srun, does indeed run fine.
Try submitting with
--export=ALL,CONFIG=${CONFIG},NGPUs=${NGPUs},NGPUS_PER_NODE=${NGPUS_PER_NODE} \
Note the added ALL as first item in the list.
Another option is to simply remove the --export line entirely and export the variables explicitly in the submit.sh script as the submission environment is propagated by default by Slurm to the job.
export PARTITION=$1
export NGPUs=$2
export CONFIG=$3
export NGPUS_PER_NODE=4
export NCPUS_PER_TASK=10

How do I create a new directory for a Slurm job prior to setting the working directory?

I want to create a unique directory for each Slurm job I run. However, mkdir appears to interrupt SBATCH commands. E.g. when I try:
#!/bin/bash
#SBATCH blah blah other Slurm commands
mkdir /path/to/my_dir_$SLURM_JOB_ID
#SBATCH --chdir=/path/to/my_dir_$SLURM_JOB_ID
touch test.txt
...the Slurm execution faithfully creates the directory at /path/to/my_dir_$SLURM_JOB_ID, but skips over the --chdir command and executes the sbatch script from the working directory the batch was called from.
Is there a way to create a unique directory for the output of a job and set the working directory there within a single sbatch script?
First off, the #SBATCH options must be at the top of the file, and citing the documentation
before any executable commands
So it is expected behaviour that the --chdir is not honoured in this case. The issue rationale is that the #SBATCH options, and the --chdir in particular, is used by Slurm to setup the environment in which the job starts. That environment must be decided before the job starts, and cannot be modified afterwards by Slurm.
For similar reasons, environment variables are not processed in #SBATCH options ; they are simply ignored by Bash as they are in a commented line, and Slurm makes no effort to expand them itself.
Also note that --chdir is used to
Set the working directory of the batch script to directory before it is executed.
and that directory must exist. Slurm will not create it for you.
What you need to do is call the cd command in your script.
#!/bin/bash
#SBATCH blah blah other Slurm commands
WORKDIR=/path/to/my_dir_$SLURM_JOB_ID
mkdir -p "$WORKDIR" && cd "$WORKDIR" || exit -1
touch test.txt
Note the exit -1 so that if the directory creation fails, your job stops rather than continuing in the submission directory.
As a side note, it is always interesting to add a set -euo pipefail line in your script. It makes sure your script stops if any command in it fails.

How can I convert my script for submitting SLURM jobs from Bash to Perl?

I have the following Bash script for job submission to SLURM on a cluster:
#!/bin/bash
#SBATCH -A 1234
#SBATCH -t 2-00:00
#SBATCH -n 24
module add xxx
srun resp.com
The #SBATCH lines are SLURM commands:
#SBATCH -A 1234 is the project number (1234)
#SBATCH -t 2-00:00 is the job time
#SBATCH -n 24 is the number of cores
module add xxx loads the Environment Module xxx (in this case I'm actually using module add gaussian, where gaussian is a computational quantum-chemistry program).
srun is the SLURM command to launch a job. resp.com includes commands for gaussian and atom coordinates.
I tried converting the Bash script to the following Perl script, but it didn't work. How can I do this in Perl?
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
system ("#SBATCH -A 1234");
system ("#SBATCH -t 2-00:00");
system ("#SBATCH -n 24");
system ("module add xxx");
system ("srun resp.com ");
Each of your system calls creates a child process to run the program in question and returns when the child process dies.
The whole point of module is to configure the current shell by, among other things, modifying it's environment. When this process completes (dies) say goodbye to those changes. The call to srun, in it's shinny new process with a shinny new environment, hasn't got a chance.
Steps forward:
Understand SLURM & bash and exactly why system("#SBATCH whatever"); might not be of any value. Hint: # marks the beginning of a comment in both Bash & Perl.
Understand what module add is doing with xxx and how you might replicate what it's doing inside the shell within the Perl interpreter. ThisSuitIsBlackNot recommends use Env::Modulecmd { load => 'foo/1.0' }; to replicate this functionality.
Barring any understanding of module add, system ('module add xxx; srun resp.com') would put those two commands in the same shell process, but at this point you need to ask yourself what you've gained by adding a Perl interpreter to the mix.
What you need to do is write
#!/usr/bin/perl
#SBATCH -A 1234
#SBATCH -t 2-00:00
#SBATCH -n 24
use strict;
use warnings;
use diagnostics;
system ("module add xxx && srun resp.com ");
and then submit it with
sbatch my_perl_script.pl
The #SBATCH lines are comments destined to be parsed by the sbatch command. They must be comments in the submission script.
The module command modifies the environment, but that environment is lost as soon as it is called if you invoke it with system on its own as system creates a subshell. You need to either invoke it on the same subshell as srun, as shown above, or use Perl tools to load the module in the environment of the Perl script so that it is available to srun, using use Env::Modulecmd { load => 'foo/1.0' }; as mentioned elsewhere.

Providing standard input to a fortran code running on a cluster running SLURM

I have a code that I have successfully installed on several calculating clusters that use a PBS queuing system, however I have hit a substantial stumbling block in installing it onto a cluster using the SLURM queuing system. The bulk of the code runs fine, however the code needs to be provided with its filename (which changes with each calculation), and it expects to receive it as a standard input:
character*8 name
read (5,'(a8)') name
and I provide this standard input to the cluster by:
srun_ps $1/$2.exe << EOD
$2
EOD
where $1 is the path of the executable, and $2 is the filename and srun_ps seems to be the cluster built mpi-exec script. For note this bit of code works fine on the clusters I have used with a PBS queuing system.
However what I get out here is an "end-of-file during read, unit 5, file stdin" error.
Also if I run a similar command on the command line of the login server (where the jobs are submitted through):
#helloworld.for
charachter*5 name
read(5,A5) name
write(6,A5) name
command line:
ifort -o helloworld.exe helloworld.for
./helloworld.exe << EOD
hello
EOD
provides the correct output of "hello". If I submit the same job to the cluster I again get an "end-of-file" error.
The full job submission script is:
#!/bin/bash
#SBATCH -o /home/Simulation/file.job.o
#SBATCH -D /home/Simulation/
#SBATCH -J file.job
#SBATCH --clusters=mpp1
#SBATCH --get-user-env
#SBATCH --ntasks=12
#SBATCH --time=1:00:00
source /etc/profile.d/modules.sh
/home/script/runjob /home/Simulation/ file
and relevant part of the runjob script is (the rest of the script is copying relevant input files, and file clean up after the calculation has completed):
#!/bin/sh
time srun_ps $1/$2.exe << EOD
$2
EOD
I realise this is probably an entirely too specific problem, but any advice would be appreciated.
David.
Try adding a line such as
#SBATCH -i filename
to your job submission script, replacing filename by whatever cryptic macro ($3 or whatever) will be expanded when you submit the script. Or, you might put this in your srun command, something like
srun_ps $1/$2.exe EOD
but I admit to some confusion about what gets called when in your scripts.

Resources