Sbatch job array with dependency - bash

I am running an array of a job in which I want the job (running the script job.sh) to have a dependency where it can't go to the scheduler until the previous array job has started and I can't seem to figure out how to code this
#!/bin/bash
#SBATCH --account=*******
#SBATCH --array=1-3
#SBATCH --time=00:05:00
#SBATCH --mem-per-cpu=256M
#SBATCH --dependency=???????
./job.sh >output$SLURM_ARRAY_TASK_ID.txt
exit
This will be used to scaled up so it can run 1000 versions of the code on a much larger time/memory/array size.

Related

Request maximum number of threads & cores on node via Slurm job scheduler

I have a heterogeneous cluster, containing either 14-core or 16-core CPUs (28 or 32 threads). I manage job submissions using Slurm. Some requirements:
It doesn't matter which CPU is used for a calculation.
I don't want to specify which CPU a job should go to.
A job should consume all available cores on the CPU (14 or 16).
I want mpirun to handle threading.
To illustrate the peculiarities of the problem, I show a job script that works on the 16-core CPUs:
#!/bin/bash
#SBATCH -J test
#SBATCH -o job.%j.out
#SBATCH -N 1
#SBATCH -n 32
mpirun -np 16 vasp
An example job script that works on the 14-core CPUs is:
#!/bin/bash
#SBATCH -J test
#SBATCH -o job.%j.out
#SBATCH -N 1
#SBATCH -n 28
mpirun -np 14 vasp
The second job script runs on the 16-core CPUs but, unfortunately, the job is about 35% slower than when I request 32 threads as is done in the first script. That's an unacceptable performance loss for my application.
I haven't figured out if there is a good way around this challenge. To me, a solution would be to request a variable number of resources, such as
#SBATCH -n [28-32]
and to tailor the mpirun -np x vasp line accordingly. I haven't found a way to do this, however. Are there any suggestions on how to achieve this directly in Slurm or is there a good workaround?
I tried to use the environmental variable $SLURM_CPUS_ON_NODE, but this variable is only set after the node is selected, so cannot be used in a #SBATCH line.
I also looked at the --constraint flag but this does not seem to give sufficiently granular control over threading requests.
Actually it should work as you want it to by simply specifying that you want a full node:
#!/bin/bash
#SBATCH -J test
#SBATCH -o job.%j.out
#SBATCH -N 1
#SBATCH --exclusive
mpirun vasp
mpirun will start the number of processes as defined in SLURM_TASKS_PER_NODE that will be set by Slurm to the number of tasks that can be created on the node, that is the number of CPUs if you do not request more than one CPU per task.

How to run two scripts in parallel on the same node but different cores with sbatch?

I am new to SLURM and I want to run two scripts (each of them take 2 minutes to run) in the same time (parallel) on the same node, same socket, but on different cores. I have a system where one node has 2 sockets, and each socket has 10 cores.
Based on what I have read in one other question (SLURM: How can I run different executables on parallel on the same compute node or in different nodes?) I came up with this code:
#!/bin/bash
#SBATCH -J script
#SBATCH --time=00:05:00
#SBATCH --exclusive
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=2
#SBATCH --partition=xeon
#SBATCH --output=OUTPUT.txt
#SBATCH --hint=nomultithread
ml intel
module load openmpi
srun -c 1 --exclusive ./runScript1 &
srun -c 1 --exclusive ./runScript2 &
wait
But when I am typing the squeue --job [jobID] in a repetitive way I see that it takes 4 minutes for the both scripts to execute, which makes me think that they are run sequentially (after the first one is finished, starts the second one).
I have tried also to use taskset to select specific core, but I had some errors.
I am running the script from above using sbatch.
Please tell me if I am assuming wrong or doing something wrong.
It is possible to select a specific core to run on, by including taskset or --cpu-bind option in my code?

How may I run jobs every tot minutes?

I'm new here and hope to ask the question correctly.
I'm working on a server that has SLURM as a scheduler. I have many samples on which I have to run a script, but to avoid overlapping the jobs and clogging the cpu I would like to start the jobs one after the other after two hours.
These are the resources I have allocated within the script.
#!/bin/sh
#SBATCH --job-name=bwa
#SBATCH --output=%x_%j.log
#SBATCH --error=%x_%j.err
#SBATCH --mem-per-cpu=2G
#SBATCH -n 128
#SBATCH -N 1
#SBATCH --time=10:00:00
And this is how I run the loop in the folder where the files to be analyzed are present.
for i in *bam; do sbatch pipeline.sh $i; done
How can I add a command that starts the next job after two hours that the previous one is running?
Any advice or suggestion is welcome, both for the resources allocated and for the way of running the jobs.
Thanks in advance to everyone.

Slurm: how many times will failed jobs be --requeue'd

I have a Slurm job array for which the job file includes a --requeue directive. Here is the full job file:
#!/bin/bash
#SBATCH --job-name=catsss
#SBATCH --output=logs/cats.log
#SBATCH --array=1-10000
#SBATCH --requeue
#SBATCH --partition=scavenge
#SBATCH --mem=32g
#SBATCH --time=24:00:00
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=douglas.duhaime#gmail.com
module load Langs/Python/3.4.3
python3 cats.py ${SLURM_ARRAY_TASK_ID} 'cats'
Several of the array values have restarted at least once. I would like to know, how many times will these jobs restart before they are finally cancelled by the scheduler? Will the restarts carry on indefinitely until a sysadmin manually cancels them, or do jobs like this have a maximum number of retries?
AFAIK, the jobs can be requeued in infinite occasions. You just decide if the job is prepared to be requeued or not. If not-requeue, then it will never be requeued. If requeue, then it will be requeued everytime the system decides it is needed (node failure, higher priority job preemption...).
The jobs keep restarting until they finish (successfully or not, but finished instead of interrupted).

Running slurm script with multiple nodes, launch job steps with 1 task

I am trying to launch a large number of job steps using a batch script. The different steps can be completely different programs and do need exactly one CPU each. First I tried doing this using the --multi-prog argument to srun. Unfortunately, when using all CPUs assigned to my job in this manner, performance degrades massively. The run time increases to almost its serialized value. By undersubscribing I could ameliorate this a little. I couldn't find anything online regarding this problem, so I assumed it to be a configuration problem of the cluster I am using.
So I tried going a different route. I implemented the following script (launched via sbatch my_script.slurm):
#!/bin/bash
#SBATCH -o $HOME/slurm/slurm_out/%j.%N.out
#SBATCH --error=$HOME/slurm/slurm_out/%j.%N.err_out
#SBATCH --get-user-env
#SBATCH -J test
#SBATCH -D $HOME/slurm
#SBATCH --export=NONE
#SBATCH --ntasks=48
NR_PROCS=$(($SLURM_NTASKS))
for PROC in $(seq 0 $(($NR_PROCS-1)));
do
#My call looks like this:
#srun --exclusive -n1 bash $PROJECT/call_shells/call_"$PROC".sh &
srun --exclusive -n1 hostname &
pids[${PROC}]=$! #Save PID of this background process
done
for pid in ${pids[*]};
do
wait ${pid} #Wait on all PIDs, this returns 0 if ANY process fails
done
I am aware, that the --exclusive argument is not really needed in my case. The shell scripts called contain the different binaries and their arguments. The remaining part of my script relies on the fact that all processes have finished hence the wait. I changed the calling line to make it a minimal working example.
At first this seemed to be the solution. Unfortunately when increasing the number of nodes used in my job allocation (for example by increasing --ntasks to a number larger than the number of CPUs per node in my cluster), the script does not work as expected anymore, returning
srun: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1
and continuing using only one node (i.e. 48 CPUs in my case, which go through the job steps as fast as before, all processes on the other node(s) are subsequently killed).
This seems to be the expected behaviour, but I can't really understand it. Why is it that every job step in a given allocation needs to include a minimum number of tasks equal to the number of nodes included in the allocation. I ordinarily really do not care at all about the number of nodes used in my allocation.
How can I implement my batch script, so it can be used on multiple nodes reliably?
Found it! The nomenclature and the many command line options to slurm confused me. The solution is given by
#!/bin/bash
#SBATCH -o $HOME/slurm/slurm_out/%j.%N.out
#SBATCH --error=$HOME/slurm/slurm_out/%j.%N.err_out
#SBATCH --get-user-env
#SBATCH -J test
#SBATCH -D $HOME/slurm
#SBATCH --export=NONE
#SBATCH --ntasks=48
NR_PROCS=$(($SLURM_NTASKS))
for PROC in $(seq 0 $(($NR_PROCS-1)));
do
#My call looks like this:
#srun --exclusive -N1 -n1 bash $PROJECT/call_shells/call_"$PROC".sh &
srun --exclusive -N1 -n1 hostname &
pids[${PROC}]=$! #Save PID of this background process
done
for pid in ${pids[*]};
do
wait ${pid} #Wait on all PIDs, this returns 0 if ANY process fails
done
This specifies to run the job on exactly one node incorporating a single task only.

Resources