Slurm - running multiple tasks at one node - multiprocessing

Suppose I want to run a program with 100 different input arguments. This is what I would do on my laptop for example:
for i in {1..32}
do
./test.sh $i
done
where test.sh is just a dummy program
#!/bin/bash
name=$(hostname)
touch $1.test
echo $name >> $1.test
echo $name
sleep 5
if I run the program it'll take approx. 160sec. I try to run a Slurm job on a cluster 40 nodes with 4 cpus each with my test.slurm:
#! /bin/bash
#
#SBATCH --ntasks=4
start=`date +%s`
for i in {1..32}
do
srun -l -n1 -N1 -c1 ./test.sh $i &
done
wait
end=`date +%s`
runtime=$((end-start))
echo $runtime
i get a runtime of 190 seconds instead of the expected ~40 so no multiprocesing but if i specify 2 nodes with --nodes=2 or more than 4 tasks ie a second node has to be allocated i get down to 90 seconds

Related

How to submit a job array in Hoffman2 if I have a limit of 500 jobs?

I need to submit a job array of 100'000 jobs in Hoffman2. I have a limit of 500. Thus starting job 500, I get the following error:
Unable to run job: job rejected: Only 500 jobs are allowed per user (current job count: 500). Job of user "XX" would exceed that limit. Exiting.
Right now the submission Bash code is:
#!/bin/bash
#$ -cwd
#$ -o test.joblog.LOOP.$JOB_ID
#$ -j y
#$ -l h_data=3G,h_rt=02:00:00
#$ -m n
#$ -t 1-100000
echo "STARTING TIME -- $(date) "
echo "$SGE_TASK_ID "
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh $SGE_TASK_ID
I tried to modify my code according to some Slurm Documentation but it does not work for Hoffman2 apparently (by adding % I am able to set the number of simultaneous running job).
#$ -cwd
#$ -o test.joblog.LOOP.$JOB_ID
#$ -j y
#$ -l h_data=3G,h_rt=02:00:00
#$ -m n
#$ -t 1-100000%500
echo "STARTING TIME -- $(date) "
echo "$SGE_TASK_ID "
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh $SGE_TASK_ID
Do you know how can I modify my submission Bash code in order to always have 500 running job?
Assuming that your job is visible as
/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh
in ps -e, you could try something quick and dirty like:
#!/bin/bash
maxproc=490
while : ; do
qproc=$(ps -e | grep '/u/systems/UGE8.6.4/bin/lx-amd64/qsub submit_job.sh' | wc -l)
if [ "$qproc" -lt $maxproc ] ; then
submission_code #with correct arguments
fi
sleep 10 # or anytime that you feel appropriate
done
Of course, this shows only the principle; you may need to do some testing whether there are more submission-codes; I also assumed he submissioncode self-backgrounds. And many more. But you'll get the idea.
A possible approach (free of busy waiting and ugliness of that kind) is to track the number of jobs on the client side, cap their total count at 500 and, each time any of them finishes, immediately start a new one to replace it. (This is, however, based on the assumption that the client script outlives the jobs.) Concrete steps:
Make the qsub tool block and (passively) wait for the completion of its remote job. Depending on the particular qsub implementation, it may have a -sync flag or something more complex may be needed.
Keep exactly 500, no more and, if possible, no fewer waiting instances of qsub. This can be automated by using this answer or this answer and setting MAX_PARALLELISM to 500 there. qsub itself would be started from the do_something_and_maybe_fail() function.
Here’s a copy&paste of the Bash outline from the answers linked above, just to make this answer more self-contained. Starting with a trivial and runnable harness / dummy example (with a sleep instead of a qsub -sync):
#!/bin/bash
set -euo pipefail
declare -ir MAX_PARALLELISM=500 # pick a limit
declare -i pid
declare -a pids=()
do_something_and_maybe_fail() {
### qsub -sync "$#" ... ### # add the real thing here
sleep $((RANDOM % 10)) # remove this :-)
return $((RANDOM % 2 * 5)) # remove this :-)
}
for pname in some_program_{a..j}{00..60}; do # 600 items
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail & # forking here
pids[$!]="${pname}"
echo "${#pids[#]} running" 1>&2
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
done
The first loop needs to be adjusted for the specific use case. An example follows, assuming that the right do_something_and_maybe_fail() implementation is in place and that one_command_per_line.txt is a list of arguments for qsub, one invocation per line, with an arbitrary number of lines. (The script could accept a file name as an argument or just read the commands from standard input, whatever works best.) The rest of the script would look exactly like the boilerplate above, keeping the number of parallel qsubs at MAX_PARALLELISM at most.
while read -ra job_args; do
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n \
&& echo "${pids[pid]} succeeded" 1>&2 \
|| echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail "${job_args[#]}" & # forking here
pids[$!]="${job_args[*]}"
echo "${#pids[#]} running" 1>&2
done < /path/to/one_command_per_line.txt

Inside a conda enviroment, GNU parallel jobs stop starting new jobs when I close the terminal

It is inside Conda environment (prokka, so actually, I am using the parallel version which is a prokka dependency)
My code is written in a file named test.sh, then I
nohup parallel -k -j 10 < test.sh >test.log &
while inside the test.sh, it is:
echo file1; sleep 50
echo file2; sleep 50
echo file3; sleep 50
...
echo file99; sleep 50
When I closed the terminal, it said parallel will not start new jobs:
$ cat test.log
parallel: SIGHUP received. No new jobs will be started.
parallel: Waiting for these 10 jobs to finish. Send SIGTERM to stop now.
parallel: echo file1; sleep 50
parallel: echo file2; sleep 50
parallel: echo file3; sleep 50
parallel: echo file4; sleep 50
parallel: echo file5; sleep 50
parallel: echo file6; sleep 50
parallel: echo file7; sleep 50
parallel: echo file8; sleep 50
parallel: echo file9; sleep 50
parallel: echo file10; sleep 50
file1
file2
file3
file4
file5
file6
file7
file8
file9
file10
Is the code wrong? Can't find a explanation from the internet. The parallel version I am using is 20201122.
Follow-up: Dealt when I just exit and in the base environment of conda.
Is it because
"SIGTERM is changed to SIGHUP, so sending SIGHUP will make GNU Parallel start no more jobs, but wait for running jobs to finish." ?
If you log out, then the SIGHUP signal will send to parallel. It will not start more jobs.
https://savannah.gnu.org/forum/forum.php?forum_id=9401

SLURM sbatch script not running all srun commands in while loop

I'm trying to submit multiple jobs in parallel as a preprocessing step in sbatch using srun. The loop reads a file containing 40 file names and uses "srun command" on each file. However, not all files are being sent off with srun and the rest of the sbatch script continues after the ones that did get submitted finish. The real sbatch script is more complicated and I can't use arrays with this so that won't work. This part should be pretty straightforward though.
I made this simple test case as a sanity check and it does the same thing. For every file name in the file list (40) it creates a new file containing 'foo' in it. Every time I submit the script with sbatch it results in a different number of files being sent off with srun.
#!/bin/sh
#SBATCH --job-name=loop
#SBATCH --nodes=5
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00
#SBATCH --mem-per-cpu=1G
#SBATCH -A zheng_lab
#SBATCH -p exacloud
#SBATCH --error=/home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel/log_files/test.%J.err
#SBATCH --output=/home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel/log_files/test.%J.out
DIR=/home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel
SAMPLES=$DIR/samples.txt
OUT_DIR=$DIR/test_out
FOO_FILE=$DIR/foo.txt
# Create output directory
srun -N 1 -n 1 -c 1 mkdir $OUT_DIR
# How many files to run
num_files=$(srun -N 1 -n 1 -c 1 wc -l $SAMPLES)
echo "Number of input files: " $num_files
# Create a new file for every file in listing (run 5 at a time, 1 for each node)
while read F ;
do
fn="$(rev <<< "$F" | cut -d'/' -f 1 | rev)" # Remove path for writing output to new directory
echo $fn
srun -N 1 -n 1 -c 1 cat $FOO_FILE > $OUT_DIR/$fn.out &
done <$SAMPLES
wait
# How many files actually got created
finished=$(srun -N 1 -n 1 -c 1 ls -lh $OUT_DIR/*out | wc -l)
echo "Number of files submitted: " $finished
Here is my output log file the last time I tried to run it:
Number of input files: 40 /home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel/samples.txt
sample1
sample2
sample3
sample4
sample5
sample6
sample7
sample8
Number of files submitted: 8
The issue is that srun redirects its stdin to the tasks it starts, and therefore the contents of $SAMPLES is consumed, in an unpredictable way, by all the cat commands that are started.
Try with
srun --input none -N 1 -n 1 -c 1 cat $FOO_FILE > $OUT_DIR/$fn.out &
The --input none parameter will tell srun to not mess with stdin.

Multiple shell script workers

We'd like to interpret tons of coordinates and do something with them using multiple workers.
What we got:
coords.txt
100, 100, 100
244, 433, 233
553, 212, 432
776, 332, 223
...
8887887, 5545554, 2243234
worker.sh
coord_reader='^([0-9]+), ([0-9]+), ([0-9]+)$'
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ $coord_reader ]]; then
x=${BASH_REMATCH[1]}
y=${BASH_REMATCH[2]}
z=${BASH_REMATCH[3]}
echo "x is $x, y is $y, z is $z"
fi
done < "$1"
To execute worker.sh we call bash worker.sh coords.txt
Bc we have an amount of millions of coordinates it's needed to split the coords.txt and create multiple workers doing the same task, like coordsaa, coordsab, coordsac each 1 worker.
So we split coords.txt using split.
split -l 1000 coords.txt coords
But, how to assign one file per worker?
I am new to stackoverflow, feel free to comment so I can improve my asking skills.
To run workers from bash to treat a lot of files:
Files architecture:
files/ runner.sh worker.sh
files/ : it is a folder with a lot a files (for example 1000)
runner.sh: launch a lot a worker
worker.sh file: task to treat a file
For example:
worker.sh:
#!/usr/bin/env bash
sleep 5
echo $1
To run all files in files/ one per worker do:
runner.sh:
#!/usr/bin/env bash
n_processes=$(find files/ -type f | wc -l)
echo "spawning ${n_processes}"
for file in $(find . -type f); then
bash worker.sh "${file}" &
done
wait
/!\ 1000 processes is a lot !!
It is better to create a "pool of processes" (here it guarantees only a number maximum of process running at the same time, an old child process is not reused for a new task but died when its task is done or failed) :
#!/usr/bin/env bash
n_processes=8
echo "max of processes: ${n_processes}"
for file in $(find files/ -type f); do
while [[ $(jobs -r | wc -l) -gt ${n_processes} ]]; do
:
done
bash worker.sh "${file}" &
echo "process pid: $! finished"
done
wait
It is not really a pool of processes but it avoids having a lot of processes at the same time alive, number maximum of processes alive at the same time is given by n_processes.
Execute bash runner.sh.
I would do this with GNU Parallel. Say you want 8 workers running at a time till all the processing is done:
parallel -j 8 --pipepart -a coords.txt --fifo bash worker.sh {}
where:
-j8 means "keep 8 jobs running at a time"
--pipepart means "split the input file into parts"
-a coords.txt means "this is the input file"
--fifo means "create a temporary fifo to send the data to, and save its name in {} to pass to your worker script"

How to run a fixed number of processes in a loop?

I have a script like this:
#!/bin/bash
for i=1 to 200000
do
create input file
run ./java
done
I need to run a number (8 or 16) of processes (java) at the same time and I don't know how. I know that wait could help but it should be running 8 processes all the time and not wait for the first 8 to finish before starting the other 8.
bash 4.3 added a useful new flag to the wait command, -n, which causes wait to block until any single background job, not just the members of a given subset (or all), to complete.
#!/bin/bash
cores=8 # or 16, or whatever
for ((i=1; i <= 200000; i++))
do
# create input file and run java in the background.
./java &
# Check how many background jobs there are, and if it
# is equal to the number of cores, wait for anyone to
# finish before continuing.
background=( $(jobs -p) )
if (( ${#background[#]} == cores )); then
wait -n
fi
done
There is a small race condition: if you are at maximum load but a job completes after you run jobs -p, you'll still block until another job
completes. There's not much you can do about this, but it shouldn't present too much trouble in practice.
Prior to bash 4.3, you would need to poll the set of background jobs periodically to see when the pool dropped below your threshold.
while :; do
background=( $(jobs -p))
if (( ${#background[#]} < cores )); then
break
fi
sleep 1
done
Use GNU Parallel like this, simplified to 20 jobs rather than 200,000 and the first job is echo rather than "create file" and the second job is sleep rather than "java".
seq 1 20 | parallel -j 8 -k 'echo {}; sleep 2'
The -j 8 says how many jobs to run at once. The -k says to keep the output in order.
Here is a little animation of the output so you can see the timing/sequence:
With a non-ancient version of GNU utilities or on *BSD/OSX, use xargs with the -P option to run processes in parallel.
#!/bin/bash
seq 200000 | xargs -P 8 -n 1 mytask
where mytask is an auxiliary script, with the sequence number (the input line) available as the argument$1`:
#!/bin/bash
echo "Task number $1"
create input file
run ./java
You can put everything in one script if you want:
#!/bin/bash
seq 200000 | xargs -P 8 -n 1 sh -c '
echo "Task number $1"
create input file
run ./java
' mytask
If your system doesn't have seq, you can use the bash snippet
for ((i=1; i<=200000; i++)); do echo "$i"; done
or other shell tools such as
awk '{for (i=1; i<=200000; i++) print i}' </dev/null
or
</dev/zero tr '\0' '\n' | head -n 200000 | nl
Set up 8 subprocesses that read from a common stream; each subprocess reads one line of input and starts a new job whenever its current job completes.
forker () {
while read; do
# create input file
./java
done
}
cores=8 # or 16, or whatever
for ((i=1; i<=200000; i++)); do
echo $i
done | while :; do
for ((j=0; j< cores; j++)); do
forker &
done
done
wait # Waiting for the $core forkers to complete

Resources