Slurm heterogeneous job for multiple tests - cluster-computing

I have to perform some tests on an HPC cluster and I am using Slurm as workload manager.
Since I have to perform similar tests on different allocations, I decided to exploit the heterogenous job support for Slurm.
Here is my Slurm script:
# begin of slurm_script.sh
#!/bin/bash
#SBATCH -p my_partition
#SBATCH --exclusive
#SBATCH --time 16:00:00 # format: HH:MM:SS
#SBATCH -N 1 # 1 node
#SBATCH --ntasks-per-node=32 # tasks out of 128
#SBATCH --gres=gpu:4 # gpus per node out of 4
#SBATCH --mem=246000 # memory per node out of 246000MB
#SBATCH hetjob
#SBATCH -p my_partition
#SBATCH --exclusive
#SBATCH --time 16:00:00 # format: HH:MM:SS
#SBATCH -N 2 # 2 nodes
#SBATCH --ntasks-per-node=32 # tasks out of 128
#SBATCH --gres=gpu:4 # gpus per node out of 4
#SBATCH --mem=246000 # memory per node out of 246000MB
#SBATCH hetjob
#SBATCH -p my_partition
#SBATCH --exclusive
#SBATCH --time 16:00:00 # format: HH:MM:SS
#SBATCH -N 4 # 4 nodes
#SBATCH --ntasks-per-node=32 # tasks out of 128
#SBATCH --gres=gpu:4 # gpus per node out of 4
#SBATCH --mem=246000 # memory per node out of 246000MB
#SBATCH hetjob
#SBATCH -p my_partition
#SBATCH --exclusive
#SBATCH --time 16:00:00 # format: HH:MM:SS
#SBATCH -N 8 # 8 nodes
#SBATCH --ntasks-per-node=32 # tasks out of 128
#SBATCH --gres=gpu:4 # gpus per node out of 4
#SBATCH --mem=246000 # memory per node out of 246000MB
#SBATCH hetjob
#SBATCH -p my_partition
#SBATCH --exclusive
#SBATCH --time 16:00:00 # format: HH:MM:SS
#SBATCH -N 16 # 16 nodes
#SBATCH --ntasks-per-node=32 # tasks out of 128
#SBATCH --gres=gpu:4 # gpus per node out of 4
#SBATCH --mem=246000 # memory per node out of 246000MB
srun --job-name=job1 --output=4cpu_%N_%j.out --het-group=0 script.sh 4
srun --job-name=job2 --output=8cpu_%N_%j.out --het-group=0 script.sh 8
srun --job-name=job3 --output=16cpu_%N_%j.out --het-group=0 script.sh 16
srun --job-name=job4 --output=32cpu_%N_%j.out --het-group=0 script.sh 32
srun --job-name=job5 --output=64cpu_%N_%j.out --het-group=1 script.sh 64
srun --job-name=job6 --output=128cpu_%N_%j.out --het-group=2 script.sh 128
srun --job-name=job7 --output=256cpu_%N_%j.out --het-group=3 script.sh 256
srun --job-name=job8 --output=512cpu_%N_%j.out --het-group=4 script.sh 512
Here script.sh takes as argument the number of processors and it is of the form
make cpp_program_I_need_to_run
mkdir -p my_results
mpirun -n $1 cpp_program_I_need_to_run
# other tasks
When I perform, on my cluster sbatch slurm_script.slurm jobs launched crash with exit code 8 and the following error(s):
cat slurm-8482798.out
srun: error: r242n13: tasks 0-31: Exited with exit code 8
srun: launch/slurm: _step_signal: Terminating StepId=8482798.0
srun: error: r242n13: tasks 0-31: Exited with exit code 8
srun: launch/slurm: _step_signal: Terminating StepId=8482798.1
srun: error: r242n13: tasks 0-31: Exited with exit code 8
srun: launch/slurm: _step_signal: Terminating StepId=8482798.2
srun: error: r242n13: tasks 0-31: Exited with exit code 8
srun: launch/slurm: _step_signal: Terminating StepId=8482798.3
...
also
slurmstepd: error: Unable to create TMPDIR [/scratch_local/slurm_job.8482798]: Permission denied
slurmstepd: error: Setting TMPDIR to /tmp
slurmstepd: error: execve(): /cluster/home/userexternal/username/myfolder/script.sh: Exec format error
slurmstepd: error: execve(): /cluster/home/userexternal/username/myfolder/script.sh: Exec format error
...
and so on for many lines.
Is there a way to make it work? The only thing I can think is that the mpirun call in my script.sh is redundant, but then I don't have many ideas.
Thank you in advance

Indeed the mpirun command is redundant. Could you clarify what script.sh is expected to perform?
My approach would be to execute the make in advance, place mkdir -p my_results just after the #SBATCH directives (I assume the directory should be shared among all the job elements, otherwise you should use environment variables to point to node-local storage) and remove mpirun in favour of srun ... cpp_program_I_need_to_run.

Related

slurm Read-only file system

The error that I got from slurm:
slurmstepd: error: unlink(/var/lib/slurm-llnl/slurmd/job856638/slurm_script): Read-only file system
slurmstepd: error: rmdir(/var/lib/slurm-llnl/slurmd/job856638): Read-only file system
slurmstepd: error: Unable to unlink domain socket `/var/lib/slurm-llnl/slurmd/node18_856638.4294967294`: Read-only file system
The script:
#!/bin/bash
#
#SBATCH --job-name=short_N2_rep2b
#SBATCH --output=/mnt/beegfs/home1/slurm_out/short_N2_rep2b.%N.%j.out
#SBATCH --error=/mnt/beegfs/home1/slurm_err/short_N2_rep2b.%N.%j.err
#SBATCH --ntasks=1
#SBATCH -N 1
#SBATCH -n 12
#SBATCH --mem-per-cpu=8000
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=***
python2 /mnt/beegfs/home1/hicPipe/hicPipe.py short --exReg /mnt/beegfs/home1/blacklisted_region/ce11-blacklist.bed --nThread 12 /mnt/beegfs/home1/celegans_reference_genome/PRJNA13758/index/c_elegans.PRJNA13758.WS285.genomic.fa.gz /mnt/beegfs/home1/ARC-C/fastq/N2_rep2b/HiC05_L3_50U_S1_R1_val_1.fq.gz /mnt/beegfs/home1/ARC-C/fastq/N2_rep2b/HiC05_L3_50U_S1_R2_val_2.fq.gz /mnt/beegfs/home1/ARC-C/hicpipe/N2_rep2b
May I ask the solutions? Thanks!

Anyone know what's causing this linux error?

I'm trying to run deepvariant via their singularity container on the HPC, however I get this error, and I can't figure it out!
Code:
#!/bin/bash --login
#SBATCH -J AmyHouseman_deepvariant
#SBATCH -o %x.stdout.%J.%N
#SBATCH -e %x.stderr.%J.%N
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p c_compute_wgp
#SBATCH --account=scw1581
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=HousemanA#cardiff.ac.uk # Where to send mail
#SBATCH --array=1-33
#SBATCH --time=02:00:00
#SBATCH --time=072:00:00
#SBATCH --mem-per-cpu=32GB
module purge
module load singularity
module load parallel
set -eu
cd /scratch/c.c21087028/
BIN_VERSION="1.3.0"
singularity pull docker://google/deepvariant:"${BIN_VERSION}"
sed -n "${SLURM_ARRAY_TASK_ID}p" Polyposis_Exome_Analysis/fastp/All_fastp_input/List_of_33_exome_IDs | parallel -j 1 "singularity run singularity run -B /usr/lib/locale/:/usr/lib/locale/ \
docker://google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WES \
-ref=Polyposis_Exome_Analysis/bwa/index/HumanRefSeq/GRCh38_latest_genomic.fna \
--reads=Polyposis_Exome_Analysis/samtools/index/indexed_picardbamfiles/{}PE_markedduplicates.bam \
--output_vcf=Polyposis_Exome_Analysis/deepvariant/vcf/{}PE_output.vcf.gz \
--output_gvcf=Polyposis_Exome_Analysis/deepvariant/gvcf/{}PE_output.vcf.gz \
--intermediate_results_dir=Polyposis_Exome_Analysis/deepvariant/intermediateresults/{}PE_output_intermediate"
Error:
FATAL: While making image from oci registry: error fetching image to cache: failed to get checksum for docker://google/deepvariant:1.3.0: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp 52.0.218.102:443: connect: network is unreachable
I've asked a lot of people, and I'm still stuck! Thanks, Amy

Problem running COMSOL in a cluster with SLURM

I am trying to upload this job via a .sh script to a cluster with SLURM, using the COMSOL software:
#!/bin/bash
#SBATCH --job-name=my_work
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --mem=20G
#SBATCH --partition=my_partition
#SBATCH --time=4-0
#SBATCH --no-requeue
#SBATCH --exclusive
#SBATCH -D $HOME
#SBATCH --output=Lecho1_%j.out
#SBATCH --error=Lecho1_%j.err
cd /home/myuser/myfile/
module load intel/2019b
module load OpenMPI/4.1.1
module load COMSOL/5.5.0
comsol batch -mpibootstrap slurm -nn 20 -nnhost 20 -inputfile myfile.mph -outputfile
myfile.outout.mph -study std1 -batchlog myfile.mph.log
and when doing so I get the following error message:
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1743)......: channel initialization failed
MPID_Init(2137)......: PMI_Init returned -1
Can anyone tell me what it means and how to fix it completely?
The way you call COMSOL is incorrect. Submission script should contain the following lines to run COMSOL in a cluster with SLURM:
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=COMSOL_JOB
#SBATCH --mem=200gb
#SBATCH --cpus-per-task=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=48
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load COMSOL/5.5
comsol batch -mpirmk pbs -job b1 -alivetime 15 -recover \
-inputfile "mymodel.mph" -outputfile "mymodel.mph.out" \
-batchlog "mymodel.mph.log"

Running a command on SLURM that takes command-line arguments

I'm completely new to using HPCs and SLURM, so I'd really appreciate some guidance here.
I need to iteratively run a command that looks like this
kallisto quant -i '/home/myName/genomes/hSapien.idx' \
-o "output-SRR3225412" \
"SRR3225412_1.fastq.gz" \
"SRR3225412_2.fastq.gz"
where the SRR3225412 part will be different in each interation
The problem is, as I found out, I can't just append this to the end of an sbatch command
sbatch --nodes=1 \
--ntasks-per-node=1 \
--cpus-per-task=1 \
kallisto quant -i '/home/myName/genomes/hSapien.idx' \
-o "output-SRR3225412" \
"SRR3225412_1.fastq.gz" \
"SRR3225412_2.fastq.gz"
This command doesn't work. I get the error
sbatch: error: This does not look like a batch script. The first
sbatch: error: line must start with #! followed by the path to an interpreter.
sbatch: error: For instance: #!/bin/sh
I wanted to ask, how do I run the sbatch command, specifying its run parameters, and also adding the command-line arguments for the kallisto program I'm trying to use? In the end I'd like to have something like
#!/bin/bash
for sample in ...
do
sbatch --nodes=1 \
--ntasks-per-node=1 \
--cpus-per-task=1 \
kallistoCommandOnSample --arg1 a1 \
--arg2 a2 arg3 a3
done
The error sbatch: error: This does not look like a batch script. is because sbatch expect a submission script. It is a batch script, typically a Bash script, in which comments starting with #SBATCH are interpreted by Slurm as options.
So the typical way of submitting a job is to create a file, let's name it submit.sh:
#! /bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
kallisto quant -i '/home/myName/genomes/hSapien.idx' \
-o "output-SRR3225412" \
"SRR3225412_1.fastq.gz" \
"SRR3225412_2.fastq.gz"
and then submit it with
sbatch submit.sh
If you have multiple similar jobs to submit, it is beneficial for several reasons to use a job array. The loop you want to create can be replaced with a single submission script looking like
#! /bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --array=1-10 # Replace here with the number of iterations in the loop
SAMPLES=(...) # here put what you would loop over
CURRSAMPLE=${SAMPLE[$SLURM_ARRAY_TASK_ID]}
kallisto quant -i '/home/myName/genomes/hSapien.idx' \
-o "output-${CURRSAMPLE}" \
"${CURRSAMPLE}_1.fastq.gz" \
"${CURRSAMPLE}_2.fastq.gz"
As pointed out by #Carles Fenoy, if you do not want to use a submission script, you can use the --wrap parameter of sbatch:
sbatch --nodes=1 \
--ntasks-per-node=1 \
--cpus-per-task=1 \
--wrap "kallisto quant -i '/home/myName/genomes/hSapien.idx' \
-o 'output-SRR3225412' \
'SRR3225412_1.fastq.gz' \
'SRR3225412_2.fastq.gz'"

Starting n Spark worker nodes corresponding to n+1 slurm tasks

I am using an HPC cluster where the job manager is SLURM. On getting 6 slurm tasks using the following command:
#SBATCH -p batch
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=3
#SBATCH --cpus-per-task=4
I would like to have 1 Spark Master and 5 Spark Worker nodes deployed.
Currently, I am using a script where one master and one slave would be deployed in on node (using a single Slurm task) and the rest of the tasks would deploy just one worker node each. How would I modify this script to deploy multiple worker nodes per task? In the current script, the worker nodes would be assigned a predefined static port number such that only the ip address along with the port number could differentiate the worker. For the same reason, no more than one worker could be uniquely identified in a single node. Therefore the current script is useful only if the --ntasks-per-node is 1.
How would I dynamically allocate ports making sure that the rest of the script would work as expected?
#!/bin/bash -l
#SBATCH -J bda-job
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=28
#SBATCH --time=01:30:00
#SBATCH -p batch
#SBATCH --qos qos-batch
#SBATCH -o %x-%j.log
### Load latest available Spark
module load devel/Spark
### If you do not wish tmp dirs to be cleaned
### at the job end, set below to 0
export SPARK_CLEAN_TEMP=1
### START INTERNAL CONFIGURATION
## CPU and Memory settings
export SPARK_WORKER_CORES=${SLURM_CPUS_PER_TASK}
export DAEMON_MEM=4096
export NODE_MEM=$((4096*${SLURM_CPUS_PER_TASK}-${DAEMON_MEM}))
export SPARK_DAEMON_MEMORY=${DAEMON_MEM}m
export SPARK_NODE_MEM=${NODE_MEM}m
## Set up job directories and environment variables
export BDA_HOME_DIR="$HOME/bda"
export SPARK_JOB_DIR="$BDA_HOME_DIR/spark-jobs"
export SPARK_JOB="$BDA_HOME_DIR/spark-jobs/${SLURM_JOBID}"
mkdir -p "${SPARK_JOB}"
export SPARK_HOME=$EBROOTSPARK
export SPARK_WORKER_DIR=${SPARK_JOB}
export SPARK_LOCAL_DIRS=${SPARK_JOB}
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=9080
export SPARK_SLAVE_WEBUI_PORT=9081
export SPARK_INNER_LAUNCHER=${SPARK_JOB}/spark-start-all.sh
export SPARK_MASTER_FILE=${SPARK_JOB}/spark_master
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_ROOT_LOGGER="WARN,DRFA"
export SPARK_SUBMIT_OPTIONS="--conf spark.executor.memory=${SPARK_NODE_MEM} --conf spark.worker.memory=${SPARK_NODE_MEM}"
## Generate spark starter-script
cat << 'EOF' > ${SPARK_INNER_LAUNCHER}
#!/bin/bash
## Load configuration and environment
source "$SPARK_HOME/sbin/spark-config.sh"
source "$SPARK_HOME/bin/load-spark-env.sh"
if [[ ${SLURM_PROCID} -eq 0 ]]; then
## Start master in background
export SPARK_MASTER_HOST=$(hostname)
MASTER_NODE=$(scontrol show hostname ${SLURM_NODELIST} | head -n 1)
echo "spark://${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT}" > "${SPARK_MASTER_FILE}"
"${SPARK_HOME}/bin/spark-class" org.apache.spark.deploy.master.Master \
--ip $SPARK_MASTER_HOST \
--port $SPARK_MASTER_PORT \
--webui-port $SPARK_MASTER_WEBUI_PORT &
## Start one slave with one less core than the others on this node
export SPARK_WORKER_CORES=$((${SLURM_CPUS_PER_TASK}-1))
"${SPARK_HOME}/bin/spark-class" org.apache.spark.deploy.worker.Worker \
--webui-port ${SPARK_SLAVE_WEBUI_PORT} \
spark://${MASTER_NODE}:${SPARK_MASTER_PORT} &
## Wait for background tasks to complete
wait
else
## Start (pure) slave
MASTER_NODE=spark://$(scontrol show hostname $SLURM_NODELIST | head -n 1):${SPARK_MASTER_PORT}
"${SPARK_HOME}/bin/spark-class" org.apache.spark.deploy.worker.Worker \
--webui-port ${SPARK_SLAVE_WEBUI_PORT} \
${MASTER_NODE}
fi
EOF
chmod +x ${SPARK_INNER_LAUNCHER}
## Launch SPARK and wait for it to start
srun ${SPARK_INNER_LAUNCHER} &
while [ -z "$MASTER" ]; do
sleep 5
MASTER=$(cat "${SPARK_MASTER_FILE}")
done
### END OF INTERNAL CONFIGURATION

Resources