How to correctly submit an array of jobs using slurm - bash

I'm trying to submit an array of jobs using slurm, but it's not working as I expected. My bash script is test.sh:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH --account=myaccount
#SBATCH --partition=partition
#SBATCH --time=10:00:00
###Array setup here
#SBATCH --array=1-6
#SBATCH --output=test_%a.out
echo TEST MESSAGE 1
echo $SLURM_ARRAY_TASK_ID
python test.py
The test.py code:
print('TEST MESSAGE 2')
I then submitted this job by doing:
sbatch --wrap="bash test.sh"
I'm not even sure if this is how I should run it. Because there are already SBATCH commands in the bash script, should I just be running bash test.sh?
I was expecting that 5 jobs would be submitted and that $SLURM_ARRAY_TASK_ID would increase incrementally, but that's not happening. Just one job is submitting and the output is:
TEST MESSAGE 1
TEST MESSAGE 2
So the $SLURM_ARRAY_TASK_ID never get's printed and seems to be the problem. Can anyone tell me what I'm doing wrong?

You just need to submit the script with sbatch test.sh. Using --wrap in the way you've done it just runs test.sh as a simple Bash script, so none of the Slurm-specific parts are used.

Related

execute bash files when previous ones have finished to run

Hello I would need help
In fact I need to execute several bash files ex:
file1.sh
file2.sh
file3.sh
file4.sh
those file will generate data that will be used for another bash file call final.sh
So in order to gain time I want to execute the fileNb.sh files sumultany on a cluster by doing :
for file in file*.sh; do sbatch $file; done
, and then when all job have been done, I would like to execute automatically the final.sh file.
Does someone have an idea ?
Thank you very much
One clean option is to reorganise the set of jobs as a job array and then add a dependency for final job on the whole array.
Assuming fileN.sh looks like this:
#!/bin/bash
#SBATCH --<some option>
#SBATCH --<some other option>
./my_program input_fileN
you can make this a job array. In a single submission file file.sh, write this
#!/bin/bash
#SBATCH --<some option>
#SBATCH --<some other option>
#SBATCH --array=1-4
./my_program input_file${SLURM_ARRAY_TASK_ID}
Then run
JOBID=$(sbatch --parsable file.sh)
sbatch --dependency after:$JOBID final.sh
In case your jobs cannot be parametrised by an integer directly, create a Bash array like this:
#!/bin/bash
#SBATCH --<some option>
#SBATCH --<some other option>
#SBATCH --array=0-2
ARGS=(SRR63563 SRR63564 SRR63565)
fasterq-dump --threads 10 ${ARGS[$SLURM_ARRAY_TASK_ID]} -O /path1/path2/path3/
You could do:
sbatch --wait file1.sh &
sbatch --wait file2.sh &
sbatch --wait file3.sh &
sbatch --wait file4.sh &
wait
sbatch final.sh
Or, more simply with GNU Parallel:
parallel -j4 sbatch --wait ::: file*.sh
sbatch final.sh
Is this no good?
for file in file*.sh; do sbatch $file; done; ./final.sh

Do I need a single bash file for each task in SLURM?

I am trying to launch several task in a SLURM-managed cluster, and would like to avoid dealing with dozens of files.
Right now, I have 50 tasks (subscripted i, and for simplicity, i is also the input parameter of my program), and for each one a single bash file slurm_run_i.sh which indicates the computations configuration, and the srun command:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -J pltCV
#SBATCH --mem=30G
srun python plotConvergence.py i
I am then using another bash file to submit all these tasks, slurm_run_all.sh
#!/bin/bash
for i in {1..50}:
sbatch slurm_run_$i.sh
done
This works (50 jobs are running on the cluster), but I find it troublesome to have more than 50 input files. Searching a solution, I came up with the & command, obtaining something as:
#!/bin/bash
#SBATCH --ntasks=50
#SBATCH --cpus-per-task=1
#SBATCH -J pltall
#SBATCH --mem=30G
# Running jobs
srun python plotConvergence.py 1 &
srun python plotConvergence.py 2 &
...
srun python plotConvergence.py 49 &
srun python plotConvergence.py 50 &
wait
echo "All done"
Which seems to run as well. However, I cannot manage each of these jobs independently: the output of squeue shows I have a single job (pltall) running on a single node. As there are only 12 cores on each node in the partition I am working in, I am assuming most of my jobs are waiting on the single node I've been allocated to. Setting the -N option doesn't change anything too.. Moreover, I cannot cancel some jobs individually anymore if I realize there's a mistake or something, which sounds problematic to me.
Is my interpretation right, and is there a better way (I guess) than my attempt to process several jobs in slurm without being lost among many files ?
What you are looking for is the jobs array feature of Slurm.
In your case, you would have a single submission file (slurm_run.sh) like this:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -J pltCV
#SBATCH --mem=30G
#SBATCH --array=1-50
srun python plotConvergence.py ${SLURM_ARRAY_TASK_ID}
and then submit the array of jobs with
sbatch slurm_run.sh
You will see that you will have 50 jobs submitted. You can cancel all of them at once or one by one. See the man page of sbatch for details.

comment in bash script processed by slurm

I am using slurm on a cluster to run jobs and submit a script that looks like below with sbatch:
#!/usr/bin/env bash
#SBATCH -o slurm.sh.out
#SBATCH -p defq
#SBATCH --mail-type=ALL
#SBATCH --mail-user=my.email#something.com
echo "hello"
Can I somehow comment out a #SBATCH line, e.g. the #SBATCH --mail-user=my.email#something.com in this script? Since the slurm instructions are bash comments themselves I would not know how to achieve this.
just add another # at the beginning.
##SBATCH --mail-user...
This will not be processed by Slurm

Changing the bash script sent to sbatch in slurm during run a bad idea?

I wanted to run a python script main.py multiple times with different arguments through a sbatch_run.sh script as in:
#!/bin/bash
#SBATCH --job-name=sbatch_run
#SBATCH --array=1-1000
#SBATCH --exclude=node047
arg1=10 #arg to be change during runs
arg2=12 #arg to be change during runs
python main.py $arg1 $arg2
The arguments are encoded in the bash file ran by sbatch. I was worried that if I ran sbatch_run.sh multiple times one after the other but changing the value of arg1 and arg2 during each run, that it might cause errors in my runs. For example if I do:
sbatch sbatch_run.sh # with arg1=10 and arg2=12
and then immediately after I change sbatch_run.sh but run the file again as in:
sbatch sbatch_run.sh # with arg1=69 and arg2=666
would case my runs to all run with the last one (i.e. arg1=69 and arg2=666) instead of each run with its own arguments.
I know for sure that if I hard code the arguments in main.py and then run the same sbatch script but change the main.py it will run the last one. I was wondering if that is the case too if I change the sbatch_run.sh script.
Just so you know, I did try this experiment, by running 1000 scripts, then some get queued and put a sleep command and then change the sbatch_run.sh. It seems to not change what my run is, however, if I am wrong this is way too important to be wrong by accident and wanted to make sure I asked too.
For the record I ran:
#!/bin/bash
#SBATCH --job-name=ECHO
#SBATCH --array=1-1000
#SBATCH --exclude=node047
sleep 15
echo helloworld
echo 5
and then change the echo to echo 10 or echo byebyeworld.
When sbatch is run, Slurm copies the submission script to its internal database ; you can convince yourself with the following experiment:
$ cat submit.sh
#!/bin/bash
#SBATCH --hold
echo helloworld
The --hold is there to make sure the job does not start. Submit it :
$ sbatch submit.sh
Then modify the submission script:
$ sed -i 's/hello/bye/' submit.sh
$ cat submit.sh
#!/bin/bash
#SBATCH --hold
echo byeworld
and now use control show job to see the script Slurm is planning to run:
$ scontrol show -ddd job YOURJOBID
JobId=******* JobName=submit.sh
[...]
BatchScript=
#!/bin/bash
#SBATCH --hold
echo helloworld
[...]
It hasn't changed although the original script has.
[EDIT] Recent versions of Slurm use scontrol write batch_script - rather than scontrol show -dd job to show the submission script.

Change the value of external SLURM variable

I am running a bash script to run jobs on Linux clusters, using SLURM. The relevant part of the script is given below (slurm.sh):
#!/bin/bash
#SBATCH -p parallel
#SBATCH --qos=short
#SBATCH --exclusive
#SBATCH -o out.log
#SBATCH -e err.log
#SBATCH --open-mode=append
#SBATCH --cpus-per-task=1
#SBATCH -J hadoopslurm
#SBATCH --time=01:30:00
#SBATCH --mem-per-cpu=1000
#SBATCH --mail-user=amukherjee708#gmail.com
#SBATCH --mail-type=ALL
#SBATCH -N 5
I am calling this script from another script (ext.sh), a part of which is given below:
#!/bin/bash
for i in {1..3}
do
source slurm.sh
done
..
I want to manipulate the value of the N variable is slurm.sh (#SBATCH -N 5) by setting it to values like 3,6,8 etc, inside the for loop of ext.sh. How do I access the variable programmatically from ext.sh? Please help.
First note that if you simply source the shell script, you will not submit a job to Slurm, you will simply run the job on the submission node. So you need to write
#!/bin/bash
for i in {1..3}
do
sbatch slurm.sh
done
Now if you want to change the -N programmatically, one option is to remove it from the file slurm.sh and add it as argument to the sbatch command:
#!/bin/bash
for i in {1..3}
do
sbatch -N $i slurm.sh
done
The above script will submit three jobs with respectively 1, 2, and 3 nodes requested.

Resources