I am using slurm on a cluster to run jobs and submit a script that looks like below with sbatch:
#!/usr/bin/env bash
#SBATCH -o slurm.sh.out
#SBATCH -p defq
#SBATCH --mail-type=ALL
#SBATCH --mail-user=my.email#something.com
echo "hello"
Can I somehow comment out a #SBATCH line, e.g. the #SBATCH --mail-user=my.email#something.com in this script? Since the slurm instructions are bash comments themselves I would not know how to achieve this.
just add another # at the beginning.
##SBATCH --mail-user...
This will not be processed by Slurm
Related
I'm trying to submit an array of jobs using slurm, but it's not working as I expected. My bash script is test.sh:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G
#SBATCH --account=myaccount
#SBATCH --partition=partition
#SBATCH --time=10:00:00
###Array setup here
#SBATCH --array=1-6
#SBATCH --output=test_%a.out
echo TEST MESSAGE 1
echo $SLURM_ARRAY_TASK_ID
python test.py
The test.py code:
print('TEST MESSAGE 2')
I then submitted this job by doing:
sbatch --wrap="bash test.sh"
I'm not even sure if this is how I should run it. Because there are already SBATCH commands in the bash script, should I just be running bash test.sh?
I was expecting that 5 jobs would be submitted and that $SLURM_ARRAY_TASK_ID would increase incrementally, but that's not happening. Just one job is submitting and the output is:
TEST MESSAGE 1
TEST MESSAGE 2
So the $SLURM_ARRAY_TASK_ID never get's printed and seems to be the problem. Can anyone tell me what I'm doing wrong?
You just need to submit the script with sbatch test.sh. Using --wrap in the way you've done it just runs test.sh as a simple Bash script, so none of the Slurm-specific parts are used.
I am new to cluster computation and I want to repeat one empirical experiment 100 times on python. For each experiment, I need to generate a set of data and solve an optimization problem, then I want to obtain the averaged value. To save time, I hope to do it in parallel. For example, suppose I can use 20 cores, I only need to repeat 5 times on each core.
Here's an example of a test.slurm script that I use for running the test.py script on a single core:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=72:00:00
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --mail-user=address#email
module purge
module load anaconda3/2018.12
source activate py36
python test.py
If I want to run it in multiple cores, how should I modify the slurm file accordingly?
To run the test on multiple cores, you can use srun -n option. After -n specify the number of processes, you need to launch.
srun -n 20 python test.py
srun is the launcher in slurm.
Or you can change the ntasks, cpus-per-task in slurm file.
The slurm file will look like this:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=72:00:00
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --mail-user=address#email
module purge
module load anaconda3/2018.12
source activate py36
python test.py
Hello I would need help
In fact I need to execute several bash files ex:
file1.sh
file2.sh
file3.sh
file4.sh
those file will generate data that will be used for another bash file call final.sh
So in order to gain time I want to execute the fileNb.sh files sumultany on a cluster by doing :
for file in file*.sh; do sbatch $file; done
, and then when all job have been done, I would like to execute automatically the final.sh file.
Does someone have an idea ?
Thank you very much
One clean option is to reorganise the set of jobs as a job array and then add a dependency for final job on the whole array.
Assuming fileN.sh looks like this:
#!/bin/bash
#SBATCH --<some option>
#SBATCH --<some other option>
./my_program input_fileN
you can make this a job array. In a single submission file file.sh, write this
#!/bin/bash
#SBATCH --<some option>
#SBATCH --<some other option>
#SBATCH --array=1-4
./my_program input_file${SLURM_ARRAY_TASK_ID}
Then run
JOBID=$(sbatch --parsable file.sh)
sbatch --dependency after:$JOBID final.sh
In case your jobs cannot be parametrised by an integer directly, create a Bash array like this:
#!/bin/bash
#SBATCH --<some option>
#SBATCH --<some other option>
#SBATCH --array=0-2
ARGS=(SRR63563 SRR63564 SRR63565)
fasterq-dump --threads 10 ${ARGS[$SLURM_ARRAY_TASK_ID]} -O /path1/path2/path3/
You could do:
sbatch --wait file1.sh &
sbatch --wait file2.sh &
sbatch --wait file3.sh &
sbatch --wait file4.sh &
wait
sbatch final.sh
Or, more simply with GNU Parallel:
parallel -j4 sbatch --wait ::: file*.sh
sbatch final.sh
Is this no good?
for file in file*.sh; do sbatch $file; done; ./final.sh
I am trying to launch several task in a SLURM-managed cluster, and would like to avoid dealing with dozens of files.
Right now, I have 50 tasks (subscripted i, and for simplicity, i is also the input parameter of my program), and for each one a single bash file slurm_run_i.sh which indicates the computations configuration, and the srun command:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -J pltCV
#SBATCH --mem=30G
srun python plotConvergence.py i
I am then using another bash file to submit all these tasks, slurm_run_all.sh
#!/bin/bash
for i in {1..50}:
sbatch slurm_run_$i.sh
done
This works (50 jobs are running on the cluster), but I find it troublesome to have more than 50 input files. Searching a solution, I came up with the & command, obtaining something as:
#!/bin/bash
#SBATCH --ntasks=50
#SBATCH --cpus-per-task=1
#SBATCH -J pltall
#SBATCH --mem=30G
# Running jobs
srun python plotConvergence.py 1 &
srun python plotConvergence.py 2 &
...
srun python plotConvergence.py 49 &
srun python plotConvergence.py 50 &
wait
echo "All done"
Which seems to run as well. However, I cannot manage each of these jobs independently: the output of squeue shows I have a single job (pltall) running on a single node. As there are only 12 cores on each node in the partition I am working in, I am assuming most of my jobs are waiting on the single node I've been allocated to. Setting the -N option doesn't change anything too.. Moreover, I cannot cancel some jobs individually anymore if I realize there's a mistake or something, which sounds problematic to me.
Is my interpretation right, and is there a better way (I guess) than my attempt to process several jobs in slurm without being lost among many files ?
What you are looking for is the jobs array feature of Slurm.
In your case, you would have a single submission file (slurm_run.sh) like this:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH -J pltCV
#SBATCH --mem=30G
#SBATCH --array=1-50
srun python plotConvergence.py ${SLURM_ARRAY_TASK_ID}
and then submit the array of jobs with
sbatch slurm_run.sh
You will see that you will have 50 jobs submitted. You can cancel all of them at once or one by one. See the man page of sbatch for details.
I am running a bash script to run jobs on Linux clusters, using SLURM. The relevant part of the script is given below (slurm.sh):
#!/bin/bash
#SBATCH -p parallel
#SBATCH --qos=short
#SBATCH --exclusive
#SBATCH -o out.log
#SBATCH -e err.log
#SBATCH --open-mode=append
#SBATCH --cpus-per-task=1
#SBATCH -J hadoopslurm
#SBATCH --time=01:30:00
#SBATCH --mem-per-cpu=1000
#SBATCH --mail-user=amukherjee708#gmail.com
#SBATCH --mail-type=ALL
#SBATCH -N 5
I am calling this script from another script (ext.sh), a part of which is given below:
#!/bin/bash
for i in {1..3}
do
source slurm.sh
done
..
I want to manipulate the value of the N variable is slurm.sh (#SBATCH -N 5) by setting it to values like 3,6,8 etc, inside the for loop of ext.sh. How do I access the variable programmatically from ext.sh? Please help.
First note that if you simply source the shell script, you will not submit a job to Slurm, you will simply run the job on the submission node. So you need to write
#!/bin/bash
for i in {1..3}
do
sbatch slurm.sh
done
Now if you want to change the -N programmatically, one option is to remove it from the file slurm.sh and add it as argument to the sbatch command:
#!/bin/bash
for i in {1..3}
do
sbatch -N $i slurm.sh
done
The above script will submit three jobs with respectively 1, 2, and 3 nodes requested.