How to show the status of all job steps defined in a sbatch script, including those not yet created due to resource contention - cluster-computing

I'm using SLURM sbatch to launch a bunch of parallel tasks in a cluster. The total amount of cores that I need to run all tasks in parallel exceeds the total amount of cores that my sbatch script asks for, so some job steps won't run until others have finished.
Here's an example script that reflects my use case: let's say each node in the cluster has 40 cores, I use sbatch to allocate 10 nodes, so 400 cores at my disposal. But I have 12 tasks to run, and each of my tasks is asking for 40 cores, so they need a total of 480 cores to run in parallel.
#!/bin/bash
#SBATCH --cpus-per-task=40
#SBATCH --nodes=10
#below is a total of 12 invocations of srun
srun --cpus-per-task=40 --nodes=1 --ntasks=1 --job-name=first <executable> &
srun --cpus-per-task=40 --nodes=1 --ntasks=1 --job-name=second <executable> &
...
srun --cpus-per-task=40 --nodes=1 --ntasks=1 --job-name=twelfth <executable> &
wait
My problem is, sacct won't show the status of all twelve job steps until all invocations of srun can get the resource they need. How can I adjust my way of using SLURM, so that immediately after I submit my batch script, I can inspect the state of all "twelve" job steps?
Here's my current way of operation:
Call sbatch <the script above>, and then call sacct -j <JobID>. At first, only ten job steps will show up in the output, all in running state:
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
XXX script batch (null) 0 RUNNING 0:0
XXX.0 first (null) 0 RUNNING 0:0
XXX.1 second (null) 0 RUNNING 0:0
XXX.2 third (null) 0 RUNNING 0:0
XXX.3 fourth (null) 0 RUNNING 0:0
XXX.4 fifth (null) 0 RUNNING 0:0
XXX.5 sixth (null) 0 RUNNING 0:0
XXX.6 seventh (null) 0 RUNNING 0:0
XXX.7 eighth (null) 0 RUNNING 0:0
XXX.8 nineth (null) 0 RUNNING 0:0
XXX.9 tenth (null) 0 RUNNING 0:0
... and logfile slurm-.out would tell me: srun: Job XXX step creation temporarily disabled, retrying (Requested nodes are busy)
When one job step finally completes, the logfile will print a new line: srun: Step created for job XXX and the output of sacct -j <JobID> will look like this (note there are eleven job steps now):
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
XXX script batch (null) 0 RUNNING 0:0
XXX.0 first (null) 0 RUNNING 0:0
XXX.1 second (null) 0 RUNNING 0:0
XXX.2 third (null) 0 RUNNING 0:0
XXX.3 fourth (null) 0 RUNNING 0:0
XXX.4 fifth (null) 0 RUNNING 0:0
XXX.5 sixth (null) 0 RUNNING 0:0
XXX.6 seventh (null) 0 RUNNING 0:0
XXX.7 eighth (null) 0 COMPLETED 0:0
XXX.8 nineth (null) 0 RUNNING 0:0
XXX.9 tenth (null) 0 RUNNING 0:0
XXX.10 eleventh (null) 0 RUNNING 0:0
It could be possible I was missing on some options as the manual for SLURM is really unwieldy. I've already read How to know the status of each process of one job in the slurm cluster manager?, but that does not solve my problem.
I appreciate suggestions on how to solve my problem, or how to use SLURM in a "more correct" way.

Related

Run million of list in PBS with parallel tool

I've huge size(few million) job contain list and wants to run java written tool to perform the features comparison. This tool completes the calculation in
real 0m0.179s
user 0m0.005s
sys 0m0.000s sec
Running 5 nodes(each have 72 cpus) with pbs torque scheduler in the GNU parallel, tool runs fine and produces the results but as I set 72 jobs per node, it should run 72 x 5 jobs at a time but I can see only it runs 25-35 jobs!
Checking of cpu utilization on each node also shows low utilization.
I desire to run 72 X 5 jobs or more at a time and produce the results by utilizing all the available source (72 X 5 cpus).
As I mentioned have ~200 millions of job to run, I desire to complete it faster(1-2 hours) by using/increasing the number of nodes/cpus.
Current code, input and job state:
example.lst (it has ~300 million lines)
ZNF512-xxxx_2_N-THRA-xxtx_2_N
ZNF512-xxxx_2_N-THRA-xxtx_3_N
ZNF512-xxxx_2_N-THRA-xxtx_4_N
.......
cat job_script.sh
#!/bin/bash
#PBS -l nodes=5:ppn=72
#PBS -N job01
#PBS -j oe
#work dir
export WDIR=/shared/data/work_dir
cd $WDIR;
# use available 72 cpu in each node
export JOBS_PER_NODE=72
#gnu parallel command
parallelrun="parallel -j $JOBS_PER_NODE --slf $PBS_NODEFILE --wd $WDIR --joblog process.log --resume"
$parallelrun -a example.lst sh run_script.sh {}
cat run_script.sh
#!/bin/bash
# parallel command options
i=$1
data=/shared/TF_data
# create tmp dir and work in
TMP_DIR=/shared/data/work_dir/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
# get file name
mk=$(echo "$i" | cut -d- -f1-2)
nk=$(echo "$i" | cut -d- -f3-6)
#run a tool to compare the features of pair files
/shared/software/tool_v2.1/tool -s1 $data/inf_tf/$mk -s1cf $data/features/$mk-cf -s1ss $data/features/$mk-ss -s2 $data/inf_tf/$nk.pdb -s2cf $data/features/$nk-cf.pdb -s2ss $data/features/$nk-ss.pdb > $data/$i.out
# move output files
mv matrix.txt $data/glosa_tf/matrix/$mk"_"$nk.txt
mv ali_struct.pdb $data/glosa_tf/aligned/$nk"_"$mk.pdb
# move back and remove tmp dir
cd $TMP_DIR/../
rm -rf $TMP_DIR
exit 0
PBS submission
qsub job_script.sh
Login to one of the node : ssh ip-172-31-9-208
top - 09:28:03 up 15 min, 1 user, load average: 14.77, 13.44, 8.08
Tasks: 928 total, 1 running, 434 sleeping, 0 stopped, 166 zombie
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 98.4%id, 1.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 193694612k total, 1811200k used, 191883412k free, 94680k buffers
Swap: 0k total, 0k used, 0k free, 707960k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15348 ec2-user 20 0 16028 2820 1820 R 0.3 0.0 0:00.10 top
15621 ec2-user 20 0 169m 7584 6684 S 0.3 0.0 0:00.01 ssh
15625 ec2-user 20 0 171m 7472 6552 S 0.3 0.0 0:00.01 ssh
15626 ec2-user 20 0 126m 3924 3492 S 0.3 0.0 0:00.01 perl
.....
All of the nodes top shows the similar state and produces the results by running only ~26 at a time!
I've aws-parallelcluster contains 5 nodes(each have 72 cpus) with torque scheduler and GNU Parallel 2018, Mar 2018
Update
By introducing the new function that takes input on stdin and running the script in parallel works great and utilizes all the CPU in local machine.
However, when its runs over remote machines it produces a
parallel: Error: test.lst is neither a file nor a block device
MCVE:
A simple code that echoing list gives the same error while running it in remote machines but works great in local machine:
cat test.lst # contains list
DNMT3L-5yx2B_1_N-DNMT3L-5yx2B_2_N
DNMT3L-5yx2B_1_N-DNMT3L-6brrC_3_N
DNMT3L-5yx2B_1_N-DNMT3L-6f57B_2_N
DNMT3L-5yx2B_1_N-DNMT3L-6f57C_2_N
DNMT3L-5yx2B_1_N-DUX4-6e8cA_4_N
DNMT3L-5yx2B_1_N-E2F8-4yo2A_3_P
DNMT3L-5yx2B_1_N-E2F8-4yo2A_6_N
DNMT3L-5yx2B_1_N-EBF3-3n50A_2_N
DNMT3L-5yx2B_1_N-ELK4-1k6oA_3_N
DNMT3L-5yx2B_1_N-EPAS1-1p97A_1_N
cat test_job.sh # GNU parallel submission script
#!/bin/bash
#PBS -l nodes=1:ppn=72
#PBS -N test
#PBS -k oe
# introduce new function and Run from ~/
dowork() {
parallel sh test_work.sh {}
}
export -f dowork
parallel -a test.lst --env dowork --pipepart --slf $PBS_NODEFILE --block -10 dowork
cat test_work.sh # run/work script
#!/bin/bash
i=$1
data=pwd
#create temporary folder in current dir
TMP_DIR=$data/$i
mkdir -p $TMP_DIR
cd $TMP_DIR/
# split list
mk=$(echo "$i" | cut -d- -f1-2)
nk=$(echo "$i" | cut -d- -f3-6)
# echo list and save in echo_test.out
echo $mk, $nk >> $data/echo_test.out
cd $TMP_DIR/../
rm -rf $TMP_DIR
From your timing:
real 0m0.179s
user 0m0.005s
sys 0m0.000s sec
it seems the tool uses very little CPU power. When GNU Parallel runs local jobs it has an overhead of 10 ms CPU time per job. Your jobs use 179 ms time, and 5 ms CPU time. So GNU Parallel will be using quite a bit of the time spent.
The overhead is much worse when running jobs remotely. Here we are talking 10 ms + running an ssh command. This can easily be in the order of 100 ms.
So how can we minimize the number of ssh commands and how can spread the overhead over multiple cores?
First let us make a function that can take input on stdin and run the script - one job per CPU thread in parallel:
dowork() {
[...set variables here. that becomes particularly important we when run remotely...]
parallel sh run_script.sh {}
}
export -f dowork
Test that this actually works by running:
head -n 1000 example.lst | dowork
Then let us look at running jobs locally. This can be done similar to described here: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Running-more-than-250-jobs-workaround
parallel -a example.lst --pipepart --block -10 dowork
This will split example.lst into 10 blocks per CPU thread. So on a machine with 72 CPU threads this will make 720 blocks. It will the start 72 doworks and when one is done it will get another of the 720 blocks. The reason I choose 10 instead of 1 is if one of the jobs "get stuck" for a while, then you are unlikely to notice this.
This should make sure 100% of the CPUs on the local machine is busy.
If that works, we need to distribute this work to remote machines:
parallel -j1 -a example.lst --env dowork --pipepart --slf $PBS_NODEFILE --block -10 dowork
This should in total start 10 ssh per CPU thread (i.e. 5*72*10) - namely one for each block. With 1 running per server listed in $PBS_NODEFILE in parallel.
Unfortunately this means that --joblog and --resume will not work. There is currently no way to make that work, but if it is valuable to you contact me via parallel#gnu.org.
I am not sure what tool does. But if the copying takes most of the time and if tool only reads the files, then you might just be able symlink the files into $TMP_DIR instead of copying.
A good indication of whether you can do it faster is to look at top of the 5 machines in the cluster. If they are all using all cores at >90% then you cannot expect to get it faster.

Limit cpu limit of process in a loop

I am trying to execute ffmpeg in a loop over multiple files. I only want one instance to run at a time, and to only use 50% of the cpu. I've been trying cpulimit but it isn't playing nice with the loop.
for i in {1..9}; do cpulimit -l 50 -- ffmpeg <all the options>; done
This spawns all nine jobs at once, and they are all owned by init so I have to open htop to kill them.
for i in {1..9}; do ffmpeg <all the options> & cpulimit -p $! -l 50; done
This hangs, ctrl+c continues to the next loop iteration. These instances can only be killed by SIGKILL.
Using a queue is the way to go. A simple solution that I use is Task Spooler. You can limit the number of cores ffmpeg uses with -threads also. Here's some code for you:
ts sh -c "ffmpeg -i INPUT.mp4 -threads 4 OUTPUT.mp4"
You can set the max number of simultaneous tasks to 1 with: ts -S 1
To see the current queue just run ts
You should run it in foreground. In this way the loop will work as expected.
$ cpulimit --help
...
-f --foreground launch target process in foreground and wait for it to exit
This works for me.
for file in *.mp4; do
cpulimit -f -l 100 -- ffmpeg -i "$file" <your options>
done
If you want the -threads option to have an effect on the encoder, you should put it after the -i argument, before the output filename - your current option only tells the decoding part to use a single thread. So to keep it all using a single thread, you want -threads 1 both before and after the -i option. so you can do it like:
ffmpeg -threads 1 -i INPUT.mp4 -threads 1 OUTPUT.mp4

"Max jobs to run" does not equal the number of jobs specified when using GNU Parallel on remote server?

I am trying to run many small serial jobs with GNU Parallel on a PBS cluster, each compute node has 16 cores, as I intended to use multiple compute nodes therefore I passed the option -S $SERVERNAME to GNUParallel, however what confuses me is that the number of jobs started on the node using -S $SERVERNAME does not equal to the number of jobs I specified when I intended to spawn more than 9 jobs, below are my observations:
[fchen14#shelob001 ~]$ parallel --version
GNU parallel 20160922
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.
[fchen14#shelob001 ~]$ hostname # this shows my hostname
shelob001
When use GNUParallel as local host without -S $SERVERNAME, there is no problem, I intended to spawn 10 jobs, and GNUParallel started 10 jobs:
[fchen14#shelob001 ~]$ parallel --progress echo ::: `seq 1 10`
Computers / CPU cores / Max jobs to run
1:local / 16 / 10 # 10 jobs spawned, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:10/0/100%/0.0s 1
local:9/1/100%/0.0s 2
local:8/2/100%/0.0s 3
local:7/3/100%/0.0s 4
local:6/4/100%/0.0s 5
local:5/5/100%/0.0s 6
local:4/6/100%/0.0s 7
local:3/7/100%/0.0s 8
local:2/8/100%/0.0s 9
local:1/9/100%/0.0s 10
local:0/10/100%/0.0s
When I use GNUParallel to spawn less than 10 jobs using the -S $SERVERNAME, still no problem.
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 1`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 1 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:1/0/100%/0.0s 1
shelob001:0/1/100%/1.0s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 8`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 8 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:8/0/100%/0.0s 1
shelob001:7/1/100%/1.0s 7
shelob001:6/2/100%/0.5s 3
shelob001:5/3/100%/0.3s 8
shelob001:4/4/100%/0.2s 5
shelob001:3/5/100%/0.2s 2
shelob001:2/6/100%/0.2s 6
shelob001:1/7/100%/0.1s 4
shelob001:0/8/100%/0.1s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 9`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 1
shelob001:8/1/100%/1.0s 5
shelob001:7/2/100%/0.5s 8
shelob001:6/3/100%/0.3s 2
shelob001:5/4/100%/0.2s 6
shelob001:4/5/100%/0.2s 9
shelob001:3/6/100%/0.2s 3
shelob001:2/7/100%/0.1s 4
shelob001:1/8/100%/0.1s 7
shelob001:0/9/100%/0.1s
Here is what confuses me, when I try to use a job number >=10, the number of jobs spawned is always one less than wanted, here I want to spawn 10, only started 9 jobs:
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 10` # I want to start 10 jobs
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 #why here "Max jobs to run" is 9?
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 2
shelob001:9/1/100%/3.0s 1
shelob001:8/2/100%/1.5s 7
shelob001:7/3/100%/1.0s 4
shelob001:6/4/100%/0.8s 9
shelob001:5/5/100%/0.6s 8
shelob001:4/6/100%/0.5s 3
shelob001:3/7/100%/0.4s 5
shelob001:2/8/100%/0.4s 6
shelob001:1/9/100%/0.4s 10
shelob001:0/10/100%/0.4s
[fchen14#shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 11`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 10 # it seems the jobs started is one less than I specified
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:10/0/100%/0.0s 1
shelob001:10/1/100%/3.0s 2
shelob001:9/2/100%/1.5s 8
shelob001:8/3/100%/1.0s 3
shelob001:7/4/100%/0.8s 4
shelob001:6/5/100%/0.6s 5
shelob001:5/6/100%/0.5s 7
shelob001:4/7/100%/0.4s 10
shelob001:3/8/100%/0.4s 9
shelob001:2/9/100%/0.3s 6
shelob001:1/10/100%/0.4s 11
shelob001:0/11/100%/0.4s
[fchen14#shelob001 ~]$
I checked the status of the compute node using "top", it does show that only 9 Cpus are used when I use seq 1 10. Hopefully I have made my problem clear, could anyone point out the possible cause of this problem? Any suggestion is welcome.
Thank you very much!
Looks like you found a bug. Workaround: -j+1

GNU Parallel timeout for process

I want to use GNU Parallel for this command:
seq -w 30 | parallel -k -j6 java -javaagent:build/libs/pddl4j-3.1.0.jar -server -Xms8048m -Xmx8048m fr.uga.pddl4j.planners.hsp.HSP -o pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/domain.pddl -f pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/p{}.pddl -i 8 '>>' AstarMovie.txt
I have a timeout of 600 seconds in the java program but parallel doesn't execute it. Processes can run for 2, 3, 4 or more hours and never stop.
I tried this command based on the GNU tutorial online, but it doesn't work either:
seq -w 30 | parallel -k --timeout 600000 -j6 java -javaagent:build/libs/pddl4j-3.1.0.jar -server -Xms2048m -Xmx2048m fr.uga.pddl4j.planners.hsp.HSP -o pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/domain.pddl -f pddl/benchmarks_STRIPS/benchmarks_STRIPS/ipc1/movie/p{}.pddl -i 8 '>>' AstarMovie.txt
I saw in the tutorial that GNU Parallel uses milliseconds - so 600000 is 10 minutes which is what I need but after 12 minutes the process was still running. I need 6 processes to run at once for a maximum of 10 minutes each.
Any help would be great. Thanks.
EDIT:
Why do people feel the need to edit posts for small changes like '600seconds' to '600 seconds'? Stop doing it for karma..
The timeout for GNU Parallel is given in seconds, not milliseconds. You can test it with this snippet which waits for 15 seconds but with a timeout that cuts it off after 10 seconds:
time parallel --timeout 10 sleep {} ::: 15
real 0m10.961s
user 0m0.071s
sys 0m0.038s

why the job has been submitted using qsub is unknown?

In regard to create a PBS script file to run long-term jobs on a server with 256 GB of RAM and two CPUs, each with 12 cores and 24 threads, yielding 48 computing unit. I tried to do it, but I think there is something wrong.
I created a PBS script named run_trinity and submitted it to server using qsub command (qsub run_trinity.sh) within the same directory that contain my desired program (trinity) and data, and it returned something like 47.chpc. But when I tried to check the status of job using qstat command, it says: unknown job id 47.chpc. I'm a biology student and really new in this field, could you please help me to figure out what happened? here is my PBS script:
#!/bin/bash
#PBS -N run_trinity
#PBS -l nodes=1:ppn=6
#PBS -l walltime=100:00:00
#PBS -l mem=200gb
#PBS -j oe
#Set stack size to unlimited
ulimit -s unlimited
cd /home/mary/software/trinityrnaseq_r20140717
perl /home/mary/software/trinityrnaseq_r20140717/Trinity.pl --seqType fq --JM 200G --normalize_reads --left reads8_1.fq.gz --right reads8_2.fq.gz --SS_lib_type FR --CPU 6 --full_cleanup --output /home/mary/software/trinityrnaseq_r20140717
Looking forward to hearing your perfect solutions.

Resources