What does the keyword --exclusive mean in slurm? - parallel-processing

This is a follow up question from [How to run jobs in paralell using one slurm batch script?]. The goal was to create a single SBatch-Script, which can start multiple processes and run them in parallel. The Answer given by
damienfrancois was very detailed and looked something like this.
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=/dev/null
#SBATCH --error=/dev/null
#SBATCH --partition=All
srun -n 1 -c 1 --exclusive sleep 60 &
srun -n 1 -c 1 --exclusive sleep 60 &
....
wait
However, I am not able to understand the exclusive keyword. If I use the keyword, one node of the cluster is chosen and all processes are launched there. However, I would like Slurm to distribute the ["sleeps"/steps] over the entire cluster.
So how does the keyword exclusive work ? According to the Slurm documentaion, the restriction to one node should not happen, since the keyword is used within a step-allocation.
[I am new to Slurm]

Related

Using OpenMP and OpenMPI together under Slurm

I have written a C++ code that uses both OpenMP and OpenMPI. I want to use (let's say) 3 nodes (so size_Of_Cluster should be 3) and use OpenMP in each node to parallelize the for loop (There are 24 cores in a node). In essence I want MPI ranks be assigned to nodes. The Slurm script I have written is as follows. (I have tried many variations but could not come up with the "correct" one. I would be grateful if you could help me.)
#!/bin/bash
#SBATCH -N 3
#SBATCH -n 72
#SBATCH -p defq
#SBATCH -A akademik
#SBATCH -o %J.out
#SBATCH -e %J.err
#SBATCH --job-name=MIXED
module load slurm
module load shared
module load gcc
module load openmpi
export OMP_NUM_THREADS=24
mpirun -n 3 --bynode ./program
Using srun did not help.
The relevant lines are:
#SBATCH -N 3
#SBATCH -n 72
export OMP_NUM_THREADS=24
This means you have 72 MPI processes, and each creates 24 thread. For that to be efficient you probably need 24x72 cores. Which you don't have. You should specify:
#SBATCH -n 3
Then you will have 3 processes, with 24 threads per process.
You don't have to worry about the placement of the ranks on the nodes: that is done by the run time. You could for instance let each process print MPI_Get_processor_name to confirm.

SLURM How to start a script once per node

I have a Big Cluster available through SLURM.
I want to start my script e.g. ./calc on every requested node with a specified amount of cores. So for example on 2 nodes, 16 cores each.
I start with sbatch script
#SBATCH -N 2
#SBATCH --ntasks-per-node=16
srun -N 1 ./calc 2 &
srun -N 1 ./clac 2 &
wait
It doesn't work as intended though.
I tried many configurations of --ntask --nodes --cpus-per-task but nothing worked and I'm very lost.
I also don't understand the difference between task and CPUs in SLURM
In your example, you ask slurm to launch 16 tasks per node, on 2 nodes. At the end of the job, slurm will probably runs 8x(srun)x2 nodes tasks.
For your needs, you don't need to specify that you want 2 nodes specifically instead of the jobs have to run on 2 two different nodes.
For your example, run the following sbatch :
#!/bin/bash
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=16
#SBATCH --hint=nomultithread
srun <my program>
In this example, slurm will run 2 times the program with 16 cores. The nomultithread is optional and depends the cluster configuration. If the hyper-threading is activated, this will be 16 virtual cpus.
I found this to be a working solution. It turned out the most important thing was to define all parameters nodes tasks cpus
#!/bin/bash
#SBATCH -N 2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
srun -N 1 -n 1 -c 16 ./calc 2 &
srun -N 1 -n 1 -c 16 ./calc 2 &
wait

How to run jobs in paralell using one slurm batch script?

I am trying to run multiple python scripts in parallel with one Slurm batch script. Take a look at the example below:
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=/dev/null
#SBATCH --error=/dev/null
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --partition=All
#SBATCH --time=5:00
srun sleep 60
srun sleep 60
wait
How do I tweak the script such that the execution will take only 60 sec (instead of 120) ? Splitting the script into two scripts is not an option.
As written, that script is running two sleep commands in parallel, two times in a row.
Each srun command initiates a step, and since you set --ntasks=2 each step instantiates two tasks (here the sleep command).
If you want to run two 1-task steps in parallel, you should write it this way:
srun --exclusive -n 1 -c 1 sleep 60 &
srun --exclusive -n 1 -c 1 sleep 60 &
wait
Then each step only instantiates one task, and is backgrounded by the & delimiter, meaning the next srun can start immediately. The wait command makes sure the script terminates only when both steps are finished.
In that context, the xargs command and the GNU parallel commands can be useful to avoid writing multiple identical srun lines or avoiding a for-loop.
For instance, if you have multiple files you need to run your script over:
find /path/to/data/*.csv -print0 | xargs -0 -n1 -P $SLURM_NTASKS srun -n1 --exclusive python my_python_script.py
This is equivalent to writing as many
srun -n 1 -c 1 --exclusive python my_python_script.py /path/to/data/file1.csv &
srun -n 1 -c 1 --exclusive python my_python_script.py /path/to/data/file1.csv &
srun -n 1 -c 1 --exclusive python my_python_script.py /path/to/data/file1.csv &
[...]
GNU parallel is useful to iterate over parameter values:
parallel -P $SLURM_NTASKS srun -n1 --exclusive python my_python_script.py ::: {1..1000}
will run
python my_python_script.py 1
python my_python_script.py 2
python my_python_script.py 3
...
python my_python_script.py 1000
Another approach is to just run
srun python my_python_script.py
and, inside the Python script, to look for the SLURM_PROCID environment variable and split the work according to its value. The srun command will start multiple instances of the script and each will 'see' a different value for SLURM_PROCID.
import os
print(os.environ['SLURM_PROCID'])

How to submit a script partly as array

I'm using Slurm on my lab's server, and I would like to submit a job that looks like this:
#SBATCH ...
mkdir my/file/architecture
echo "#HEADER" > my/file/architecture/output_summary.txt
for f in my/dir/*.csv; do
python3 myscript.py $f
done
Is there any way to run this so that it will complete the first instructions, then run the for loop in parallel? Each step is independant, so they can run at the same time.
The initial steps are not very complex, so if needed I could separate it into a separate SBATCH script. my/dir/ however contains about 7000 csv files to processes, so typing them all out manually would be a pain.
GNU Parallel might be a good fit here, or xargs, though I prefer parallel in Slurm jobs.
Here's an example of an sbatch script running an 8-way parallel:
#!/bin/sh
#SBATCH ...
#SBATCH --nodes=1
#SBATCH --ntasks=
srun="srun --exclusive -N1 -n1"
# -j is the number of tasks parallel runs so we set it to $SLURM_NTASKS
# Note that --ntasks=1 and --cpus-per-task=8 will have srun start one copy of the program at a time. We use "find" to generate a list of files to operate on.
find /my/dir/*.csv -type f | parallel -j $SLURM_NTASKS "$srun python3 myscript.py {}"
The easiest way is to run on a single node, though parallel can use SSH (I believe) to run on multiple computers.

Do I need a single bash file for each task in SLURM?

I am trying to launch several task in a SLURM-managed cluster, and would like to avoid dealing with dozens of files.
Right now, I have 50 tasks (subscripted i, and for simplicity, i is also the input parameter of my program), and for each one a single bash file slurm_run_i.sh which indicates the computations configuration, and the srun command:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -J pltCV
#SBATCH --mem=30G
srun python plotConvergence.py i
I am then using another bash file to submit all these tasks, slurm_run_all.sh
#!/bin/bash
for i in {1..50}:
sbatch slurm_run_$i.sh
done
This works (50 jobs are running on the cluster), but I find it troublesome to have more than 50 input files. Searching a solution, I came up with the & command, obtaining something as:
#!/bin/bash
#SBATCH --ntasks=50
#SBATCH --cpus-per-task=1
#SBATCH -J pltall
#SBATCH --mem=30G
# Running jobs
srun python plotConvergence.py 1 &
srun python plotConvergence.py 2 &
...
srun python plotConvergence.py 49 &
srun python plotConvergence.py 50 &
wait
echo "All done"
Which seems to run as well. However, I cannot manage each of these jobs independently: the output of squeue shows I have a single job (pltall) running on a single node. As there are only 12 cores on each node in the partition I am working in, I am assuming most of my jobs are waiting on the single node I've been allocated to. Setting the -N option doesn't change anything too.. Moreover, I cannot cancel some jobs individually anymore if I realize there's a mistake or something, which sounds problematic to me.
Is my interpretation right, and is there a better way (I guess) than my attempt to process several jobs in slurm without being lost among many files ?
What you are looking for is the jobs array feature of Slurm.
In your case, you would have a single submission file (slurm_run.sh) like this:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -J pltCV
#SBATCH --mem=30G
#SBATCH --array=1-50
srun python plotConvergence.py ${SLURM_ARRAY_TASK_ID}
and then submit the array of jobs with
sbatch slurm_run.sh
You will see that you will have 50 jobs submitted. You can cancel all of them at once or one by one. See the man page of sbatch for details.

Resources