multiple srun jobs within a single sbatch killed unexpectedly - cluster-computing

I was trying to run multiple srun jobs within a single sbatch script on a cluster. The sbatch script is as follows:
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 64
#SBATCH --time=200:00:00
#SBATCH -p amd_256
for i in {0..6} ;
do
cd ${i}
( srun -c 8 ./MD 150 20 300 20 20 0 0 > log.out 2>&1 & )
sleep 20
cd ..
done
cd 7/
srun -c 8 ./MD 100 20 300 20 20 0 0 > log.out 2>&1
cd ..
wait
In this script I submitted multiple srun jobs. One problem with this script is that 0-6th job will be killed after the 7th job is finished. Here is the error message I got for the 0-6th job:
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 3801214.0 ON j2308 CANCELLED AT 2021-12-22T11:02:22 ***
srun: error: j2308: task 0: Terminated
Any idea on how to fix this?

The line
( srun -c 8 ./MD 150 20 300 20 20 0 0 > log.out 2>&1 & )
creates a subshells and puts them into the background inside the subshell. So the wait-call in the last line doesn't know about those background processes, as they are part of a different shell/process. And since the batch script is now finished, the job will be terminated.
Try this:
( srun -c 8 ./MD 150 20 300 20 20 0 0 > log.out 2>&1 ) &
As an example: Try
( sleep 60 & )
wait
and
( sleep 60 ) &
wait
to see the difference.

Related

sbatch error Memory specification can not be satisfied

I wan to submit a sequential job, but I got:
sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
This is my .sh file:
#SBATCH --nodes=1
#SBATCH --time=01:00:00
#SBATCH --job-name=job-8-0
#SBATCH --mem=64000mb
#SBATCH --exclusive
module purge
module load gcc-8.3.0-gcc-4.8.5-tu6ftrf
echo "Starting job-8-0"
echo "Starting at `date`"
cd code
srun gcc -Wno-return-type file1.cpp file2.cpp file3.cpp file4.cpp file5.cpp main.cpp -o myExperiment -lstdc++ -lm
srun ./myExperiment 8 0
echo "Experiment 8-0 finished with exit code $? at: `date`"
The node login01 info is:
NodeName=login01 Arch=x86_64 CoresPerSocket=8
CPUAlloc=0 CPUErr=0 CPUTot=32 CPULoad=29.91
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:8
NodeAddr=10.0.50.0 NodeHostName=login01 Version=17.11
OS=Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018
RealMemory=1 AllocMem=0 FreeMem=65001 Sockets=2 Boards=1
State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
BootTime=2021-05-25T13:13:10 SlurmdStartTime=2021-05-25T16:35:31
CfgTRES=cpu=32,mem=1M,billing=32
AllocTRES=
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=gres/gpu count too low (0 < 8) [slurm#2021-06-28T13:38:40]
There are also other nodes with FreeMem=122000 , or 121000,... etc that is more than 64000mb
This is the specifications of the supercomputer:
• OS: Linux CentOS 7
• 300 compute nodes
• Each node has:
o 2 CPUs: Xeon E5-2650 8 Cores 2.000GHz (total 16 cores)
o 2 dual AMD FirePro S10000 GPUs
o Memory : 128 GB RAM
• Scheduler: Slurm
When I open
nodes.conf
NodeName=login01 NodeAddr=10.0.50.0 CPUs=32 Procs=32 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 State=IDLE
which is the same for all nodes.
This is slurm.conf
ClusterName=sanam
ControlMachine=mgmt01
ControlAddr=10.0.1.254
SlurmUser=slurm
SlurmdUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=2
GresTypes=gpu
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SelectType=select/cons_res
SelectTypeParameters=CR_Core
# SCHEDULING
SchedulerType=sched/backfill
FastSchedule=1
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/slurmdbd
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/none
AccountingStorageHost=10.0.1.254
include /etc/slurm/nodes.conf
include /etc/slurm/partitions.conf
#include /etc/slurm/gres.conf
What causes this problem "Memory specification can not be satisfied"?
Should I specify the RAM and CPU with srun ? specifically the second srun that run the experiment?

GNU time returns different signal than it prints out

While running cronjobs and using mk-job from check_mk to monitor its result, I've stumbled across this:
bash:
$ /usr/bin/time -f "time_exit: %x" timeout -s SIGKILL 2s sleep 10; echo "shell_exit: $?"
Command terminated by signal 9
time_exit: 0
shell_exit: 137
The exit code returned from /usr/bin/time differs from the exit code it writes to a formatted output:
time_exit != shell_exit
why?
But when using the default SIGHUP signal, exit codes match:
$ /usr/bin/time -f "time_exit: %x" timeout -s SIGHUP 2s sleep 10; echo "shell_exit: $?"
Command exited with non-zero status 124
time_exit: 124
shell_exit: 124
In the meanwhile I will use timeout -k 10s 2s ... which will first send SIGHUP and after 10s a SIGKILL, if the process was still running. In the hope that SIGHUP would properly stop it.
Background
check_mk provides mk-job to monitor job executions. mk-job uses time to record execution times AND exit code.
man time:
The time command returns when the program exits, stops, or is terminated by a signal. If the program exited normally, the return value of time is the return value of the program it executed and measured. Otherwise, the return value is 128 plus the number of the signal which caused the program to stop or terminate.
man timeout:
... It may be necessary to use the KILL (9) signal, since this signal cannot be caught, in which case the exit status is 128+9 rather than 124.
GNU time's %x only makes sense when the process exits normally, not killed by signals.
[STEP 101] $ /usr/bin/time -f "%%x = %x" bash -c 'exit 2'; echo '$? = '$?
Command exited with non-zero status 2
%x = 2
$? = 2
[STEP 102] $ /usr/bin/time -f "%%x = %x" bash -c 'exit 137'; echo '$? = '$?
Command exited with non-zero status 137
%x = 137
$? = 137
[STEP 103] $ /usr/bin/time -f "%%x = %x" bash -c 'kill -KILL $$'; echo '$? = '$?
Command terminated by signal 9
%x = 0
$? = 137
[STEP 104] $
For time timeout -s SIGKILL 2s sleep 10, timeout exits normally with 137, it's not killed by SIGKILL, just like bash -c 'exit 137' in my example.
UPDATE:
Took a look at time's source code and found out %x is blindly calling WEXITSTATUS() no matter the process exits normally or not.
655 case 'x': /* Exit status. */
656 fprintf (fp, "%d", WEXITSTATUS (resp->waitstatus));
657 break;
In the Git master it added new %Tx:
549 case 'T':
550 switch (*++fmt)
551 {
...
575 case 'x': /* exit code IF terminated normally */
576 if (WIFEXITED (resp->waitstatus))
577 fprintf (fp, "%d", WEXITSTATUS (resp->waitstatus));
578 break;
And from the Git master's time --help output:
...
%Tt exit type (normal/signalled)
%Tx numeric exit code IF exited normally
%Tn numeric signal code IF signalled
%Ts signal name IF signalled
%To 'ok' IF exited normally with code zero
...

Why is bash breaking MPI job control loop

I'm attempting to use a simple bash script to sequentially run a batch of MPI jobs. This script works perfectly when running serial code (I am using Fortran 90), but for some reason bash breaks out of the loop when I attempt to execute MPI code.
I already found a work-around to the problem. I just wrote essentially the exact same script in Perl and it worked like a charm. I just really want to understand the issue here because I prefer the simplicity of bash and it perfectly fits my own scripting needs in almost all other cases.
I've tried running the MPI code as a background process and using wait with the same result. If I run the jobs in the background without using wait, bash does not break out of the loop, but it stacks up jobs until eventually crashing. The goal is to run the executable sequentially for each parameter set anyway, I just wanted to note that the loop is not broken in that case.
Bash Script, interp.sh: Usage --> $ ./interp.sh inputfile
#!/bin/bash
PROG=$1
IFILE=$2
kount=0 # Counter variable for looping through input file
sys=0 # Counter variable to store how many times model has been run
while IFS="\n" read -r line
do
kount=$(( $kount + 1 ))
if [ $(( kount % 2 )) -eq 1 ] # if kount is even, then expect headers
then
unset name defs
sys=$(( $sys + 1 ))
name=( $line ) # parse headers
defs=${#name[*]}
k=$(( $defs - 1 ))
else # if count is odd, then expect numbers
unset vals
vals=( $line ) # parse parameters
for i in $( seq 0 $k )
do
# Define variables using header names and set their values
printf -v "${name[i]}" "${vals[i]}"
done
# Print input variable values
echo $a $b $c $d $e $nPROC
# Run executable
mpiexec -np $nPROC --oversubscribe --hostfile my_hostfile $PROG
fi
done < $IFILE
Input file, input.dat:
a b c d e nPROC
1 2 3 4 5 2
nPROC
3
nPROC
4
nPROC
5
nPROC
6
nPROC
7
nPROC
8
Sample MPI f90 code, main.f90:
program main
use mpi
implicit none
integer :: i, ierr, myID, nPROC
integer, parameter :: foolen = 100000
double precision, dimension(0:foolen) :: foo
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nPROC, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myID, ierr)
if ( myID .eq. 0 ) then
do i=0,foolen
foo(i) = i
end do
else
do i=0,foolen
foo(i) = i
end do
end if
call MPI_FINALIZE(ierr)
end program
Sample makefile:
COMP=mpif90
EXT=f90
CFLAGs=-Wall -Wextra -Wimplicit-interface -fPIC -fmax-errors=1 -g -fcheck=all \
-fbacktrace
MPIflags=--oversubscribe --hostfile my_hostfile
PROG=main.x
INPUT=input.dat
OUTPUT=output
OBJS=main.o
$(PROG): $(OBJS)
$(COMP) $(CFLAGS) -o $(PROG) $(OBJS) $(LFLAGS)
main.o: main.f90
$(COMP) -c $(CFLAGs) main.f90
%.o: %.f90
$(COMP) -c $(CFLAGs) $<
run:
make && make clean
./interp.sh $(PROG) $(INPUT)
clean:
rm -f *.o DONE watch
my_hostfile
localhost slots=4
Note that if the mpiexec line is commented out, the script runs as expected. The output looks like this:
1 2 3 4 5 2
1 2 3 4 5 3
1 2 3 4 5 4
1 2 3 4 5 5
1 2 3 4 5 6
1 2 3 4 5 7
1 2 3 4 5 8
These are the parameter values which are supposed to be passed to the MPI code in each loop. However, when mpiexec is called in the script, only the first set of parameters is read and passed.
I apologize if all that is a bit excessive, I just wanted to provide all that is needed for testing. Any help solving the issue in bash or explanation of why this happens would be greatly appreciated!
mpiexec is consuming the stdin thus reading all remaining lines in the loop. So after the first loop stdin is empty and the loop breaks.
This is an issue that occurs not only with loops calling mpiexec from whithin but also with loops other commands that consumes stdin by default such as ssh.
The general solution is to use < /dev/null so that the offending command won't consume stdin but the /dev/null instead. Some commands have special flags to replace the redirect command such as ssh -n
so the solution in this case would be to add the redirect at the end of the line where mpiexec is called:
mpiexec -np $nPROC --oversubscribe --hostfile my_hostfile $PROG < /dev/null
there are some issues to pay attention to in the case of mpiexec related to Standard I/O detailed here: https://www.open-mpi.org/doc/v3.0/man1/mpiexec.1.php#toc14

Execute a code for 5 seconds and then stop it

The goal of code is that, I want to make a random tcp traffic using iperf and capture it over 5, 10, 15, 20 seconds using tcpdump. In addition, capturing the throughput is also important for me. My problem is that, I would like to execute code1, code2, code3 and code4 for 5, 10, 15 and 20 seconds in bash. However I don't know how to put the mentioned condition for it. Here is my code:
for Test_duration in 5 10 15 20
do
echo “Test performing with $Test_duration duration”
sudo tcpdump -G 10 -W 2 -w /tmp/scripttest_$Test_duration -i h1-eth0 &
while true; do
#code1
time2=$(($RANDOM%20+1))&
pksize2=$(($RANDOM%1000+200))&
iperf -c 10.0.0.2 -t $time2 -r -l $pksize2 >> /media/sf_sharedsaeed/throughtput/iperthroughput_host2_$Test_duration.txt &\
#code2
time3=$(($RANDOM%20+1))&
pksize3=$(($RANDOM%1000+200))&
iperf -c 10.0.0.3 -t $time3 -r -l $pksize3 >> /media/sf_sharedsaeed/throughtput/iperthroughput_host3_$Test_duration.txt &\
#code3
time4=$(($RANDOM%20+1))&
pksize4=$(($RANDOM%1000+200))&
iperf -c 10.0.0.4 -t $time4 -r -l $pksize4 >> /media/sf_sharedsaeed/throughtput/iperthroughput_host4_$Test_duration.txt &\
#code4
time5=$(($RANDOM%20+1))&
pksize5=$(($RANDOM%1000+200))&
iperf -c 10.0.0.5 -t $time5 -r -l $pksize5 >> /media/sf_sharedsaeed/throughtput/iperthroughput_host5_$Test_duration.txt &\
done
done
Another constraint is that, code1, code2, code3 and code4 should be executed at the same time so, I used &.
Please help me what should I replace instead of while true; to have periodic execution of codes. Can any body help me?
You could do that by using a background subshell that creates a simple file lock on expiration that you detect from your while loops. Here is an example based on a simplified version of your code:
for Test_duration in 5 10 15 20
do
# TIMEOUT_LOCK will be your file lock
rm -f TIMEOUT_LOCK
# next command will run in a parallel subshell at the background
(sleep $Test_duration; touch TIMEOUT_LOCK) &
echo “Test performing with $Test_duration duration”
while true; do
# check whether current timeout (5 or 10 or 15 ...) has occured
if [ -f TIMEOUT_LOCK ]; then rm -f TIMEOUT_LOCK; break; fi
# do your stuff here - I'm just outputing dots and sleeping
echo -n "."
sleep 1
done
echo ""
done
The output of this code is:
“Test performing with 5 duration”
.....
“Test performing with 10 duration”
..........
“Test performing with 15 duration”
...............
“Test performing with 20 duration”
....................

wait not working in shell script?

I am running a for loop in which a command is run in background using &. In the end i want all commands to return value..
Here is the code i tried
for((i=0 ;i<3;i++)) {
// curl command which returns a value &
}
wait
// next piece of code
I want to get all three returned value and then proceed.. But the wait command does not wait for background processes to complete and runs the next part of code. I need the returned values to proceed..
Shell builtins have documentation accessible with help BUILTIN_NAME.
help wait yields:
wait: wait [-n] [id ...]
Wait for job completion and return exit status.
Waits for each process identified by an ID, which may be a process ID or a
job specification, and reports its termination status. If ID is not
given, waits for all currently active child processes, and the return
status is zero. If ID is a a job specification, waits for all processes
in that job's pipeline.
If the -n option is supplied, waits for the next job to terminate and
returns its exit status.
Exit Status:
Returns the status of the last ID; fails if ID is invalid or an invalid
option is given.
which implies that to get the return statuses, you need to save the pid and then wait on each pid, using wait $THE_PID.
Example:
sl() { sleep $1; echo $1; return $(($1+42)); }
pids=(); for((i=0;i<3;i++)); do sl $i & pids+=($!); done;
for pid in ${pids[#]}; do wait $pid; echo ret=$?; done
Example output:
0
ret=42
1
ret=43
2
ret=44
Edit:
With curl, don't forget to pass -f (--fail) to make sure the process will fail if the HTTP request did:
CURL Example:
#!/bin/bash
URIs=(
https://pastebin.com/raw/w36QWU3D
https://pastebin.com/raw/NONEXISTENT
https://pastebin.com/raw/M9znaBB2
)
pids=(); for((i=0;i<3;i++)); do
curl -fL "${URIs[$i]}" &>/dev/null &
pids+=($!)
done
for pid in "${pids[#]}"; do
wait $pid
echo ret=$?
done
CURL Example output:
ret=0
ret=22
ret=0
GNU Parallel is a great way to do high-latency things like curl in parallel.
parallel curl --head {} ::: www.google.com www.hp.com www.ibm.com
Or, filtering results:
parallel curl --head -s {} ::: www.google.com www.hp.com www.ibm.com | grep '^HTTP'
HTTP/1.1 302 Found
HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
Here is another example:
parallel -k 'echo -n Starting {} ...; sleep 5; echo done.' ::: 1 2 3 4
Starting 1 ...done.
Starting 2 ...done.
Starting 3 ...done.
Starting 4 ...done.

Resources